Causality

class: center, middle, inverse, title-slide

# Causality
## Lecture 11
### Louis SIRUGUE
### CPES 2 - Fall 2022

---

### Quick reminder

#### Data generating process

<ul>
 <li>In practice we estimate coefficients on a given realization of a data generating process</li>
 <ul>
 <li>So the true coefficient is unobserved</li>
 <li>But our estimation is informative on the values the true coefficient is likely to take</li>
 </ul>
</ul>

.left-column[
<img src="slides_files/figure-html/unnamed-chunk-2-1.png" width="90%" style="display: block; margin: auto auto auto 0;" />
]

.right-column[

`$$\frac{\hat{\beta}-\beta}{\text{SD}(\hat{\beta})} \sim \mathcal{N}(0, 1)$$`
]

---

### Quick reminder

#### Confidence interval

<ul>
 <li>This allows to infer a confidence interval:</li>
</ul>

`$$\hat{\beta}\pm t(\text{df})_{1-\frac{\alpha}{2}}\times\text{se}(\hat{\beta})$$`

<ul>
 <li>Where $t(\text{df})_{1-\frac{\alpha}{2}}$ is the value from a Student $t$ distribution</li>
 <ul>
 <li>With the relevant number of degrees of freedom $\text{df}$ (n - #parameters)</li>
 <li>And the desired confidence level $1-\alpha$</li>
 </ul>
</ul>

<ul>
 <li>And where $\text{se}(\hat{\beta})$ denotes the standard error of $\hat{\beta}$:</li>
</ul>

`$$\text{se}(\hat{\beta}) = \sqrt{\widehat{\text{Var}(\hat{\beta})}} = \sqrt{\frac{\sum_{i = 1}^n\hat{\varepsilon_i}^2}{(n-\#\text{parameters})\sum_{i = 1}^n(x_i-\bar{x})^2}}$$`

---

### Quick reminder

#### P-value

<ul>
 <li>It also allows to test how likely is $\beta$ to be different from a given value:</li>
 <ul>
 <li>If the p-value < 5%, we can reject that $\beta$ equals the hypothesized value at the 95% confidence level</li>
 <li>This threshold, very common in Economics, implies that we have 1 chance out of 20 to be wrong</li>
 </ul>
</ul>

```r
linearHypothesis(lm(ige ~ gini, ggcurve), "gini = 0")
```

```
## Linear hypothesis test
## 
## Hypothesis:
## gini = 0
## 
## Model 1: restricted model
## Model 2: ige ~ gini
## 
##   Res.Df     RSS Df Sum of Sq      F   Pr(>F)   
## 1     21 0.46733                                
## 2     20 0.26883  1    0.1985 14.767 0.001016 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

<h3>Today: Causality</h3>

.pull-left[

<ul style = "margin-left:1.5cm;list-style: none">
 <li>1. Main sources of bias</li>
 <ul style = "list-style: none">
 <li>1.1. Omitted variables</li>
 <li>1.2. Functional form</li>
 <li>1.3. Selection bias</li>
 <li>1.4. Measurement error</li>
 <li>1.5. Simultaneity</li>
 </ul>
</ul>

]

.pull-right[

<ul style = "margin-left:-1cm;list-style: none">
 <li>2. Randomized control trials</li>
 <ul style = "list-style: none">
 <li>2.1. Introduction to RCTs</li>
 <li>2.2. Types of randomization</li>
 <li>2.3. Multiple testing</li>
 </ul>
</ul>

<ul style = "margin-left:-1cm;list-style: none"><li>3. Wrap up!</li></ul>
]

---

<h3>Today: Causality</h3>

.pull-left[

]

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>Consider the following regression:</li>
 <ul>
 <li>Where $\text{Earnings}_i$ denotes individuals' annual labor earnings</li>
 <li>And $\text{Education}_i$ stands for individuals' number of years of education</li>
 </ul>
</ul>
 
`$$\text{Earnings}_i = \alpha + \beta \times \text{Education}_i + \varepsilon_i$$`
 
--

```r
summary(lm(Earnings ~ Education, sim_dat))$coefficients
```

```
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7514.800 2994.3060 2.509697 1.209949e-02
## Education 2643.312 205.2692 12.877294 1.220064e-37
```
 
--

<ul>
 <li>Taking $\hat{\beta}$ at face value, the "expected returns" from an additional year of education amount to $2,643/year</li>
 <ul>
 <li>But if we were to enforce an additional year of education for randomly selected individuals, would they earn $2,643 more than they would have earned otherwise?</li>
 </ul>
</ul>

<center>&#10140; The answer is no, because the estimated effect is not causal!</center>
 
---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>The estimated relationship could be partly driven by some confounding factors:</li>
 <ul>
 <li>Maybe more skilled individuals both study longer and earn more because they are skilled</li>
 <li>But with or without more education they would still earn more because they are skilled</li>
 </ul>
</ul>
 
--

<ul>
 <li>The ability variable acts as a confounding factor because it is correlated with both $x$ and $y$</li>
 <ul>
 <li>This would also be the case of parental socio-economic status and many other variables</li>
 <li>We need to put these variables in the regression as control variables</li>
 </ul>
</ul>
 

 
--
 
`$$\text{Earnings}_i = \alpha + \beta_1\times \text{Education}_i + \beta_2\times\text{Skills}_i + \varepsilon_i$$`

<ul>
 <li>According to you, would the estimated effect of education be higher or lower in this regression?</li>
</ul>

<center>&#10140; If skills is indeed positively correlated with both education and earnings, the new coefficient would be lower</center>

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>Remember that controlling for a variable can be viewed as:</li>
 <ul>
 <li></li>
 <li></li>
 </ul>
</ul>

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>Remember that controlling for a variable can be viewed as:</li>
 <ul>
 <li>Allowing the intercept to vary with that variable</li>
 <li></li>
 </ul>
</ul>

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>Remember that controlling for a variable can be viewed as:</li>
 <ul>
 <li>Allowing the intercept to vary with that variable</li>
 <li>Keeping this variable constant as we move along the $x$-axis</li>
 </ul>
</ul>

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>In that case the confounding variable no longer affects our relationship of interest</li>
 <ul>
 <li>It fixes the fact that more skilled individuals tend to have both higher education and earnings</li>
 <li>Such that the relationship between education and earnings is net of the effect of skills</li>
 </ul>
</ul>

---

### 1. Main sources of bias

#### 1.1. Omitted variable bias

<ul>
 <li>But we are never able to control for all potential confounding factors</li>
 <ul>
 <li>We can almost always think of variables that may affect both $x$ and $y$ but that are not in the data</li>
 <li>Resulting in what is called the omitted variable bias</li>
 </ul>
</ul>

<ul>
 <li>In that case you should either:</li>
 <ul>
 <li>Use causal identification Econometrics techniques (not covered in this course, except RCT)</li>
 <li>Acknowledge that your estimated effect is not causal with the phrase "ceteris paribus"</li>
 </ul>
</ul>

<ul>
 <li>Ceteris paribus means "Everything else equal"</li>
 <ul>
 <li>We use these sentences to indicate that our estimation is correct under the hypothesis that when our $x$ of interest moves, no confounding factor affecting $y$ moves with it</li>
 <li>Indeed, if there is no other variable varying with $x$ and $y$, our regression doesn't need more controls</li>
 <li>We know this assumption is not correct, but it is important to be transparent and clear about what the coefficient means</li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.2. Functional form

<ul>
 <li>Now consider the following relationship between years of education and earnings</li>
 <ul>
 <li></li>
 <li></li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.2. Functional form

<ul>
 <li>Now consider the following relationship between years of education and earnings</li>
 <ul>
 <li>We can fit a regression line as we usually do</li>
 <li>But would that be an appropriate estimation?</li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.2. Functional form

<ul>
 <li>We must capture the non-linearity</li>
 <ul>
 <li>The relationship cannot be correctly captured by a straight line</li>
 <li></li>
 </ul>
</ul>

`$$\text{Earnings}_i = \alpha + \beta_1\times \text{Education}_i + \varepsilon_i$$`

---
    
### 1. Main sources of bias

#### 1.2. Functional form

<ul>
 <li>We must capture the non-linearity</li>
 <ul>
 <li>The relationship cannot be correctly captured by a straight line</li>
 <li>It has the shape of a polynomial of degree 2</li>
 </ul>
</ul>

`$$\text{Earnings}_i = \alpha + \beta_1\times \text{Education}_i + \color{SkyBlue}{\beta_2\times\text{Education}^2_i}  + \varepsilon_i$$`

<ul>
 <li>Given the previous graph, what would be the signs of $\hat{\beta}_1$ and $\hat{\beta}_2$?</li>
</ul>

---
    
### 1. Main sources of bias

#### 1.2. Functional form

`$$\text{Earnings}_i = \alpha + \beta_1\times \text{Education}_i + \color{SkyBlue}{\beta_2\times\text{Education}^2_i}  + \varepsilon_i$$`

<ul>
 <li>Given the previous graph, what would be the signs of $\hat{\beta}_1$ and $\hat{\beta}_2$?</li>
 <ul>
 <li>$\hat{\beta}_1$ would be positive because the relationship is increasing</li>
 <li>$\hat{\beta}_2$ would be negative because the relationship is concave</li>
 </ul>
</ul>

<ul>
 <li>Polynomial functional forms are easy to handle in R</li>
 <ul>
 <li>You can square the dependent variable and add it in lm()</li>
 <li>geom_smooth() also allows to plot a polynomial fit</li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.2. Functional form

```r
ggplot(quadratic, aes(x = Education, y = Earnings)) + geom_point() +
  geom_smooth(method = "lm")
```

.left-column[

<img src="slides_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" />
]

---
    
### 1. Main sources of bias

#### 1.2. Functional form

```r
ggplot(quadratic, aes(x = Education, y = Earnings)) + geom_point() +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2))
```

.left-column[

<img src="slides_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" />
]

.right-column[

<ul>
 <li>But functional form is not only about polynomial degrees:</li>
 <ul>
 <li>Interactions</li>
 <li>Logs</li>
 <li>Discretization</li>
 <li>...</li>
 </ul>
</ul>
]

---
    
### 1. Main sources of bias

#### 1.3. Selection bias

<ul>
 <li>Now remember the example on high-school grades and job application acceptance
 <ul>
 <li>We plotted the grades of individuals on the $x$-axis
 <li>And whether or not they got the job on the $y$-axis
 </ul>
</ul>

.left-column[

<img src="slides_files/figure-html/unnamed-chunk-17-1.png" width="90%" style="display: block; margin: auto;" />

]

.right-column[

<ul>
 <li>We estimated that a 1 unit increase in Grade (/20) would increase the probability to be accepted by about a third on expectation, ceteris paribus
 </ul>

<ul>
 <li>Is this estimation relevant?
 <ul>
 <li>Look at the support of $x$
 </ul>
</ul>
 
]

---
    
### 1. Main sources of bias

#### 1.3. Selection bias

<ul>
 <li>The fact that almost all grades range between 13 and 17 hints at a selection problem:
 <ul>
 <li>Individuals with very low grade won't apply to the position because they know they will be rejected
 <li>Individuals with very high grade won't apply to the position because they apply to better positions
 </ul>
</ul>

.left-column[

<img src="slides_files/figure-html/unnamed-chunk-18-1.png" width="90%" style="display: block; margin: auto;" />

]

.right-column[

<ul>
 <li>Had these individuals applied, the estimated effect would be lower</li>
</ul>

<ul>
 <li>Our coefficient is specific to a non-representative sample</li>
 <ul>
 <li>Issue of external validity</li>
 <li>The interpretation only holds in our specific setting</li>
 </ul>
</ul>
 
]

---
    
### 1. Main sources of bias

#### 1.3. Selection bias

<ul>
 <li>Such selection problems are very common threats to causality</li>
</ul>

<ul>
 <li>What is the impact of going to a better neighborhood on your children outcomes?</li>
 <ul>
 <li>Those who move may be different from those who stay: self-selection issue</li>
 <li>Here it is not that the sample is not representative of the population, but that the outcomes of those who stayed are different from the outcomes those who moved would have had, if they had stayed</li>
 </ul>
</ul>

<ul>
 <li>This related to the notion of counterfactual</li>
 <ul>
 <li>If those who moved were comparable to those who stayed, it would be valid to use the outcome of those who stayed as the counterfactual outcome of those who moved</li>
 <li>But because of selection movers are not comparable to stayers so we don't have a credible counterfactual</li>
 </ul>
</ul>

<ul>
 <li>The notion of counterfactual is key to answer many questions:</li>
 <ul>
 <li>What is the impact of an immigrant inflow on the labor market outcomes of locals?</li>
 <li>We need to know how the labor market outcomes of locals would have evolved absent the immigrant inflow but we do not observe this situation</li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Another way of obtaining biased estimates is to have an independent variable measured with errors</li>
 <ul>
 <li>For instance if you want to measure the effect of cognitive skills but you only have IQ scores</li>
 <li>IQ is a noisy measure of cognitive skills as individuals' performances to such test are not always consistent</li>
 </ul>
</ul>

<ul>
 <li>It seems reasonable to assume that the measurement error follows a normal distribution:</li>
 <ul>
 <li></li>
 <li></li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>It seems reasonable to assume that the measurement error follows a normal distribution:</li>
 <ul>
 <li>Individuals usually perform close to their average performance</li>
 <li></li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

.pull-left[

<center>Denote $x$ the IQ variable</center>

`$$x \sim \mathcal{N}(100,\, 15^2)$$`
]

.pull-right[

<center>Denote $\eta$ the measurement error</center>

`$$\eta \sim \mathcal{N}(0,\, 1)$$`
]

* The true relationship is

`$$y = \alpha + \beta x + \varepsilon$$`

* But we only observe
 
`$$\tilde{x} = x + \eta$$`

* So we can only estimate:
 
`$$y = \alpha + \beta \tilde{x} + \varepsilon \,\,\, \Longleftrightarrow \,\,\, y = \alpha + \beta (x + \eta) + \varepsilon$$`

<center>&#10140; Let's use simulations to see how it may affect our estimation</center>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>We can start by generating a relationship without measurement error</li>
</ul>

`$$y_i = 1 + 2 x_i + \varepsilon_i,\, \text{with}\, \varepsilon \sim \mathcal{N}(0,\, 1)$$`

```r
dat <- tibble(x = rnorm(1000, 100, 15),
 y = 1 + (2 * x) + rnorm(1000, 0, 1))
```

.pull-left[
<ul>
 <li>Estimate the unbiased relationship</li>
</ul>

```r
lm(y ~ x, dat)$coefficient
```

```
## (Intercept)           x 
##    0.824755    2.001394
```

Is it just random chance or is `$\hat{\beta}$` downward biased? &#10140;
]

.pull-right[
<ul>
 <li>And with measurement error $\eta \sim \mathcal{N}(0,\, 1)$</li>
</ul>

```r
dat <- dat %>% 
 mutate(noisy_x = x + rnorm(1000, 0, 1))

lm(y ~ noisy_x, dat)$coefficient
```

```
## (Intercept)     noisy_x 
##    1.995596    1.990358
```
]

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

#
#

#
#
  
#
#
  
#
#
  
# 
#
#
```
---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

# Empty vector for beta...
beta <- c()

#
#
  
#
#
  
#
#
  
# 
#
#
```

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

# Empty vector for beta...
beta <- c()

# ... to be filled in a loop
for (i in sd_noise) {
  
  #
  #
  
  #
  #
  
  # 
  #
}
```

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

# Empty vector for beta...
beta <- c()

# ... to be filled in a loop
for (i in sd_noise) {
 
 # Generate noisy x with corresponding SD(eta)
 dat_i <- dat %>% mutate(noisy_x = x + rnorm(1000, 0, i))
 
 #
 #
 
 # 
 #
}
```

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

# Empty vector for beta...
beta <- c()

# ... to be filled in a loop
for (i in sd_noise) {
 
 # Generate noisy x with corresponding SD(eta)
 dat_i <- dat %>% mutate(noisy_x = x + rnorm(1000, 0, i))
 
 # Estimate the regression
 beta_i <- lm(y ~ noisy_x, dat_i)$coefficient[2]
 
 # 
 #
}
```

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>Let's have a look at how $\hat{\beta}$ behaves with an increasingly high $\text{SD}(\eta)$</li>
</ul>

```r
# Vector of standard deviations from 0 to 20
sd_noise <- 0:20

# Empty vector for beta...
beta <- c()

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>We can then plot the $\hat{\beta}$ for each value of $\text{SD}(\eta)$</li>
 <ul>
 <li></li>
 <li></li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>We can then plot the $\hat{\beta}$ for each value of $\text{SD}(\eta)$</li>
 <ul>
 <li>It is clear that the measurement error puts a downward pressure on our estimate</li>
 <li>And that the noisier the measure of $x$ the larger the bias</li>
 </ul>
</ul>

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>And this phenomenon can easily be shown mathematically:</li>
 <ul>
 <li></li>
 <li></li>
</ul>

`$$\hat{\beta} = \frac{\text{Cov}(\tilde{x},\, y)}{\text{Var}(\tilde{x})}$$`

`$$\hat{\beta} = \frac{\text{Cov}(x + \eta,\, y)}{\text{Var}(x + \eta)}$$`

--

`$$\hat{\beta} = \frac{\text{Cov}(x,\, y) + \text{Cov}(\eta,\, y)}{\text{Var}(x) + \text{Var}(\eta) + 2\text{Cov}(x,\, \eta)}$$`

`$$\hat{\beta} = \frac{\text{Cov}(x,\, y)}{\text{Var}(x) + \text{Var}(\eta)}$$`

---
    
### 1. Main sources of bias

#### 1.4. Measurement error

<ul>
 <li>And this phenomenon can easily be shown mathematically:</li>
 <ul>
 <li>The extra term in the denominator puts a downward pressure on our estimate</li>
 <li>And the bias is increasing in the amplitude of the measurement error</li>
</ul>

`$$\hat{\beta} = \frac{\text{Cov}(\tilde{x},\, y)}{\text{Var}(\tilde{x})}$$`

`$$\hat{\beta} = \frac{\text{Cov}(x + \eta,\, y)}{\text{Var}(x + \eta)}$$`

`$$\hat{\beta} = \frac{\text{Cov}(x,\, y) + \text{Cov}(\eta,\, y)}{\text{Var}(x) + \text{Var}(\eta) + 2\text{Cov}(x,\, \eta)}$$`

`$$\hat{\beta} = \frac{\text{Cov}(x,\, y)}{\text{Var}(x) + \color{SkyBlue}{\text{Var}(\eta)}}$$`
---
    
### 1. Main sources of bias

#### 1.5. Simultaneity

<ul>
 <li>So far we considered relationships whose directions were quite unambiguous</li>
 <ul>
 <li>Education &#10140; Earnings, and not the opposite</li>
 <li>High-school grades &#10140; Job acceptance, and not the opposite</li>
 </ul>
</ul>

<center>But now consider the relationship between crime rate and police coverage intensity</center>

<ul>
 <li>What is the direction of the relationship?</li>
 <ul>
 <li>It's likely that more crime would cause a positive response in police activity</li>
 <li>But also that police activity would deter crime</li>
 </ul>
</ul>

<ul>
 <li>There is no easily solution to that problem apart from:</li>
 <ul>
 <li>Working out a theoretical model sorting this issue beforehand</li>
 <li>Or designing an RCT that cuts one of the two channels</li>
 </ul>
</ul>

---

<h3>Overview: Causality</h3>

.pull-left[

<ul style = "margin-left:1.5cm;list-style: none">
 <li>1. Main sources of bias &#10004;</li>
 <ul style = "list-style: none">
 <li>1.1. Omitted variables</li>
 <li>1.2. Functional form</li>
 <li>1.3. Selection bias</li>
 <li>1.4. Measurement error</li>
 <li>1.5. Simultaneity</li>
 </ul>
</ul>

]

.pull-right[

<ul style = "margin-left:-1cm;list-style: none"><li>3. Wrap up!</li></ul>
]

---

<h3>Overview: Causality</h3>

.pull-left[

]

.pull-right[

---

### 2. Randomized control trials

#### 2.1. Introduction to RCTs

<ul>
 <li>A Randomized Controlled Trial (RCT) is a type of experiment in which the thing we want to know the impact of (called the treatment) is randomly allocated in the population</li>
 <ul>
 <li>It is a way to obtain causality from randomness</li>
 </ul>
</ul>

<ul>
 <li>RCTs are very powerful tools to sort out issues of:</li>
 <ul>
 <li>Omitted variables</li>
 <li>Selection bias</li>
 <li>Simultaneity</li>
 </ul>
</ul>

<ul>
 <li>This method is particularly used to identify causal relationships in:</li>
 <ul>
 <li>Medicine</li>
 <li>Psychology</li>
 <li>Economics</li>
 <li>...</li>
 </ul>
</ul>

<center>But how does randomness help obtaining causality?</center>

---

### 2. Randomized control trials

#### 2.1. Introduction to RCTs

<ul>
 <li>Consider estimating the effect of vitamin supplements intake on health </li>
 <ul>
 <li>Comparing health outcomes of vitamin consumers vs. non-consumers, the effect won't be causal</li>
 <li>Vitamins consumers might be richer and more healthy in general for other reasons than vitamin intake</li>
 </ul>
</ul>

<ul>
 <li>Randomization allows to solve this selection bias</li>
 <ul>
 <li>If you take two groups randomly, they would have the same characteristics on expectation</li>
 <li>And thus they would be perfectly comparable</li>
 </ul>
</ul>

Take for instance the `asec_2020.csv` dataset we've been working with:

```r
asec_2020 %>% 
  summarise(Earnings = mean(Earnings), Hours = mean(Hours),
            Black = mean(Race == "Black"), Asian = mean(Race == "Asian"),
            Other = mean(Race == "Other"), Female = mean(Sex == "Female"))
```

```
##   Earnings    Hours     Black     Asian      Other    Female
## 1 62132.37 39.54742 0.1062391 0.0703805 0.03764611 0.4809749
```

---

### 2. Randomized control trials

#### 2.1. Introduction to RCTs

<ul>
 <li>Let's compare the average characteristics for two randomly selected groups:</li>
</ul>
 
--

```r
asec_2020 %>%
* mutate(Group = ifelse(rnorm(n(), 0, 1) > 0, "Treatment", "Control")) %>%
  group_by(Group) %>%
  summarise(n = n(),
            Earnings = mean(Earnings),
            Female = 100 * mean(Sex == "Female"),
            Black = 100 * mean(Race == "Black"),
            Asian = 100 * mean(Race == "Asian"),
            Other = 100 * mean(Race == "Other"),
            Hours = mean(Hours))
```
 
--

```
## # A tibble: 2 x 8
## Group n Earnings Female Black Asian Other Hours
## <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Control 32195 62234. 48.2 10.7 7.02 3.80 39.5
## 2 Treatment 32141 62030. 48.0 10.5 7.05 3.73 39.6
```

---

### 2. Randomized control trials

#### 2.1. Introduction to RCTs

<ul>
 <li>Their average characteristics are very close!</li>
 <ul>
 <li>On expectation their average characteristics are the same</li>
 </ul>
</ul>

<ul>
 <li>And just as the two randomly selected populations are comparable in terms of their observable characteristics</li>
 <ul>
 <li>On expectation they are also comparable in terms of their unobservable characteristics!</li>
 <li>Randomization, if properly conducted, thus solves the problem of omitted variable bias</li>
 </ul>
</ul>

<center><h4>If we assign a treatment to Group 1, Group 2 would then be a valid counterfactual to estimate a causal effect!</h4></center>

<ul>
 <li>But RCTs are not immune to every problem:</li>
 <ul>
 <li>If individuals self-select in participating to the experiment their would be a selection bias</li>
 <li>Even without self-selection, if the population among which treatment is randomized is not representative there is a problem of external validity</li>
 <li>For the RCT to work, individuals should comply with the treatment allocation</li>
 <li>The sample must be sufficiently large for the average characteristics across groups to be close enough to their expected value</li>
 <li>...</li>
 </ul>
</ul>

---

### 2. Randomized control trials

#### 2.2. Types of randomization

<ul>
 <li>To some extent their are ways to deal with these problems</li>
 <ul>
 <li>Notably we can adjust the way the treatment is randomized</li>
 </ul>
</ul>

<ul>
 <li>For instance if we want to ensure that a characteristic is well balanced among the two groups, we can randomize within categories of this variable</li>
 <ul>
 <li>We don't give the treatment randomly hoping that we'll obtain the same % of females in both groups</li>
 <li>We assign the treatment randomly among females and among males separately</li>
 <li>This is called randomizing by block</li>
 <li>Note that this only works with observable characteristics!</li>
 </ul>
</ul>

```r
asec_2020 %>%
* group_by(Sex) %>% # Randomize treatment by sex
  mutate(Group = ifelse(rnorm(n(), 0, 1) > 0, 1, 0)) %>%
  ungroup() %>% group_by(Group) %>%
  summarise(...)
```

---

### 2. Randomized control trials

#### 2.2. Types of randomization

<ul>
 <li>What if you want to estimate the impact of calorie intake at the 10am break on pupils grades</li>
 <ol>
 <li>Find a school to run your experiment</li>
 <li>Take the list of pupils and randomly allocate them to treatment and control group</li>
 <li>Provide families with treated pupils a snack for the 10am break every school day</li>
 <li>Do that for a few month and collect the data on the grades of both groups</li>
 <li>Compute the difference in average grade for the treated and the control group</li>
 </ol>
</ul>

<ul>
 <li>If the 10am snack has a positive effect:</li>
 <ul>
 <li>This causal identification framework should ensure the correct estimation of that effect</li>
 <li>Right?</li>
 </ul>
</ul>

<ul>
 <li>But what about non-compliance?</li>
 <ul>
 <li>It is likely that during the 10am break, treated children share their snack with their untreated friends</li>
 <li>How would that affect our estimation?</li>
 </ul>
</ul>

---

### 2. Randomized control trials

#### 2.2. Types of randomization

<ul>
 <li>While the observed effect would be positive under full compliance, under treatment sharing:</li>
 <ul>
 <li></li>
 <li></li>
 </ul>
</ul>

---

### 2. Randomized control trials

#### 2.2. Types of randomization

<ul>
 <li>While the observed effect would be positive under full compliance, under treatment sharing:</li>
 <ul>
 <li>Treated children would have lower grades because they would benefit from less calories</li>
 <li></li>
 </ul>
</ul>

---

### 2. Randomized control trials

#### 2.2. Types of randomization

---

### 2. Randomized control trials

#### 2.2. Types of randomization

<ul>
 <li>Thus non-compliance can bias our estimation</li>
 <ul>
 <li>There would be a downward bias</li>
 <li>And our estimation wouldn't be causal</li>
 </ul>
</ul>

--

<ul>
 <li>One solution to that problem is to randomize by cluster</li>
 <ul>
 <li>Children cannot share their snack with children from other schools</li>
 </ul>
</ul>

--

<ul>
 <li>We must treat at the school level instead of the child level</li>
 <ul>
 <li>A treated unit is a school where some/all children are treated</li>
 <li>An untreated school is a school where no child is treated</li>
 </ul>
</ul>

--

<center>
Beware that in terms of inference, computing standard errors the usual way 
while the treatment is at a broader observational level than the outcome
would give fallaciously low standard errors, which would need to be corrected
</center>

---

### 2. Randomized control trials

#### 2.3. Multiple testing

<ul>
 <li>Another inference issue that RCTs can be subject to is multiple testing</li>
 <ul>
 <li>If you conduct an RCT you might be tempted to exploit the causal framework to test a myriad of effects</li>
 </ul>
</ul>

--

<ul>
 <li>You randomize your treatment and you compare the averages of many outcomes between treated and untreated individuals</li>
 <ul>
 <li>You would be tempted to conclude that there is a significant effect for every variable whose corresponding p-value < .05</li>
 <li>But you cannot do that!</li>
 </ul>
</ul>

--

<ul>
 <li>The probability to have a p-value lower than .05 just by chance for one test is indeed 5%</li>
 <ul>
 <li>But if you do multiple tests in a row, the probability to have a p-value lower than .05 for a null true effect among these multiple tests is greater than 5%</li>
 <li>The greater the number of tests, the higher the probability to get a significant result just by chance</li>
 </ul>
</ul>

--

<center><h4>This is what we call multiple testing</h4></center>

---

### 2. Randomized control trials

#### 2.3. Multiple testing
 
<img src="slides_files/figure-html/unnamed-chunk-40-1.png" width="75%" style="display: block; margin: auto;" />

---

### 2. Randomized control trials

#### 2.3. Multiple testing

* There are many ways to correct for multiple testing

<ul>
 <li>The simplest one is called the Bonferroni correction</li>
 <ul>
 <li>It consists in multiplying the p-value by the number of tests</li>
 <li>But it also leads to a large loss of power (the probability to find an effect when there is indeed an effect decreases a lot)</li>
 </ul>
</ul>

<ul>
 <li>There are more sophisticated ways to deal with the problem, which can be categorized into two approaches</li>
 <ul>
 <li>Family Wise Error Rate: Control the probability that there is at least one true assumption rejected</li>
 <li>False Discovery Rate: Control the share of true assumptions among rejected assumptions</li>
 </ul>
</ul>

<center>&#10140; We won't cover these methods in this course but keep the multiple testing issue in mind when you encounter a long series of statistical tests</center>

---

<h3>Overview: Causality</h3>

.pull-left[

]

.pull-right[

<ul style = "margin-left:-1cm;list-style: none">
 <li>2. Randomized control trials &#10004;</li>
 <ul style = "list-style: none">
 <li>2.1. Introduction to RCTs</li>
 <li>2.2. Types of randomization</li>
 <li>2.3. Multiple testing</li>
 </ul>
</ul>

<ul style = "margin-left:-1cm;list-style: none"><li>3. Wrap up!</li></ul>
]

---

### 3. Wrap up!

#### Omitted variable bias

<ul>
 <li>If a third variable is correlated with both $x$ and $y$, it would bias the relationship</li>
 <ul>
 <li>We must then control for such variables</li>
 <li>And if we can't we must acknowledge that our estimate is not causal with 'ceteris paribus'</li>
 </ul>
</ul>

---

### 3. Wrap up!

#### Functional form

<ul>
 <li>Not capturing the right functional form might also lead to biased estimations:</li>
 <ul>
 <li>Polynomial order, interactions, logs, discretization matter</li>
 <li>Visualizing the relationship is key</li>
 </ul>
</ul>

---

### 3. Wrap up!

#### Selection bias

<ul>
 <li>Self-selection is also a common threat to causality</li>
</ul>

<ul>
 <li>What is the impact of going to a better neighborhood on your children outcomes?</li>
 <ul>
 <li>We cannot just regress children outcomes on a mobility dummy</li>
 <li>Individuals who move may be different from those who stay: self-selection issue</li>
 <li>Here the outcomes of those who stayed are different from the outcomes those who moved would have had, if they had stayed</li>
 </ul>
</ul>

#### Simultaneity

<ul>
 <li>Consider the relationship between crime rate and police coverage intensity</li>
</ul>

<ul>
 <li>What is the direction of the relationship?</li>
 <ul>
 <li>We cannot just regress crime rate on police intensity</li>
 <li>It's likely that more crime would cause a positive response in police activity</li>
 <li>And also that police activity would deter crime</li>
 </ul>
</ul>

---

### 3. Wrap up!

#### Measurement error

<ul>
 <li>Measurement error in the independent variable also induces a bias</li>
 <ul>
 <li>The resulting estimation would mechanically be downward biased</li>
 <li>The noisier the measure, the larger the bias</li>
 </ul>
</ul>

---

### 3. Wrap up!

#### Randomized Controlled Trials

<ul>
 <li>A Randomized Controlled Trial (RCT) is a type of experiment in which the thing we want to know the impact of (called the treatment) is randomly allocated in the population</li>
 <ul>
 <li>The two groups would then have the same characteristics on expectation, and would be comparable</li>
 <li>It is a way to obtain causality from randomness</li>
 </ul>
</ul>

<ul>
 <li>RCTs are very powerful tools to sort out issues of:</li>
 <ul>
 <li>Omitted variables</li>
 <li>Selection bias</li>
 <li>Simultaneity</li>
 </ul>
</ul>

<ul>
 <li>But RCTs are not immune to every problem:</li>
 <ul>
 <li>The sample must be representative and large enough</li>
 <li>Participants should comply with their treatment status</li>
 <li>Independent variables must not be noisy measures of the variable of interest</li>
 <li>...</li>
 </ul>
</ul>