two-stage least-squares

- [[indirect inference]], [[endogenous variables|endogenous variable]], [[two-sample two-stage least-squares]] # Idea When researchers use two-stage least-squares, they are performing an [[instrumental variables|instrumental variable analysis]]. It's a method for estimating a causal effect. It's a consistent estimator of the [[local average treatment effect|complier average causal effect]]. ![[s20220530_103856.png]] $ Y_{i}=\beta_{0}+A_{i} \beta_{1}+\epsilon_{i} $ By [[randomization]], $Z$ and error term are independent. Rationale: Many variables can cause the treatment. We "split" the treatment into two pieces: one that can be explained by the instrument, and the part that can be explained by everything else. The part explained by the treatment is the **adjusted treatment variable**. ## Two stages ### First stage effect Estimate the instrument-treatment correlation via regression, `treatment ~ instrument`. We estimate the **predicted value of treatment $A$**, given instrument $Z$. $ \hat{A}_{i}=\hat{\alpha}_{0}+Z_{i} \hat{\alpha}_{1} $ $ \hat{A}_{i} \text { is estimate of } \mathrm{E}(\mathrm{A} \mid \mathrm{Z}) $ Matrix notation ($X$ is often used instead of $A$): $ \hat{X}=Z\left(Z^{\prime} Z\right)^{-1} Z^{\prime} X $ If treatment $A$ is binary, then we're estimating the probability of treatment given $Z$. ### Second stage effect Get estimated treatment effect via correlation between outcomes and adjusted treatments, `outcome ~ adjusted_treatment`. Regress the outcome $Y$ on the fitted value from first stage, $\hat{A_i}$: $ Y_{i}=\beta_{0}+\hat{A}_{i} \beta_{1}+\epsilon_{i} $ $\hat{A}$ is projection of $A$ onto space spanned by $Z$. The estimate of $\beta_i$ is estimate of causal effect. Matrix notion ($\hat{X}$ = $\hat{A}$): $ \hat{\boldsymbol{\beta}}_{2 S L S}=\left(\hat{X}^{\prime} \hat{X}\right)^{-1} \hat{X}^{\prime} Y $ #### Interpretation of $\beta_i$ $ \beta_{1}=E(Y \mid \hat{A}=1)-E(Y \mid \hat{A}=0) $ It's the contrast between the treated and untreated. $ (\hat{\alpha}_{0}+\hat{\alpha}_{1}) - \hat{\alpha}_{0} $ It's the [[local average treatment effect|complier average causal effect]]: $ \beta_{1}=\mathrm{CACE}=\frac{\mathrm{E}(\mathrm{Y} \mid \mathrm{Z}=1)-\mathrm{E}(\mathrm{Y} \mid \mathrm{Z}=0)}{E(A \mid Z=1)-E(A \mid Z=0)} $ ## Code implementation ```r # stage 1 stage1 <- lm(treatment ~ instrument, data = dt) dt$adjusted_treatment <- predict(stage1, data = dt) # stage 2 stage2 <- lm(outcome ~ adjusted_treatment, data = dt) # actual example library(ivpack); library(data.table) data(card.data) d <- data.table(card.data) d[, educ12 := ifelse(educ > 12, 1, 0)] # make treatment binary prop_complier <- d[, mean(educ12), keyby = .(nearc4)][, diff(V1)] # proportion of compliers prop_complier # intent-to-treat effect itt <- d[, mean(lwage), keyby = .(nearc4)][, diff(V1)] itt # complier avg causal effect itt / prop_complier # 2-stage least squares approach s1 <- lm(educ12 ~ nearc4, d) d[, predx := predict(s1, type = 'response')] s2 <- lm(lwage ~ predx, d) # SEs are incorrect! doesn't adjust for predictions from the first stage summary(s2) # fixest library # https://lrberge.github.io/fixest/reference/feols.html library(fixest) # use ~1 to estimate model without exogenous variables femodel <- feols(lwage ~ 1 | educ12 ~ nearc4, data = d) femodel feols(lwage ~ 1 | educ12 ~ nearc4, data = d, vcov = "HC1") femodel$iv_first_stage femodel$iv_first_stage$educ12$scores summary(femodel, stage = 1) summary(femodel, stage = 1:2) summary(femodel, stage = 2:1) femodel2 <- feols(lwage ~ exper + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668 | educ12 ~ nearc4, data = d, vcov = "HC1") femodel2 # ivpack library ivmodel <- ivreg(lwage ~ educ12 | nearc4, x = TRUE, data = d) summary(ivmodel) robust.se(ivmodel) table(ivmodel$x$projected) ivmodel2 <- ivreg(lwage ~ educ12 + exper + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668 | nearc4 + exper + reg661 + reg662 + reg663 + reg664 + reg665 + reg666 + reg667 + reg668, x = TRUE, data = d) summary(ivmodel2) ``` $\frac{reduced \ form}{1st \ stage}$ # Sensitivity analysis [[exclusion restriction]]: If $Z$ does directly affect $Y$ by an amount $\rho$, would my conclusions change? Vary $\rho$. [[monotonicity assumption]]: If the proportion of defiers was $\pi$, would my conclusions change? # References - [12.1 The IV Estimator with a Single Regressor and a Single Instrument | Introduction to Econometrics with R](https://www.econometrics-with-r.org/12.1-TIVEWASRAASI.html) - [Two stage leasxt squares - Instrumental Variables Methods | Coursera](https://www.coursera.org/learn/crash-course-in-causality/lecture/5B3AW/two-stage-leasxt-squares) - [IV analysis in R - Instrumental Variables Methods | Coursera](https://www.coursera.org/learn/crash-course-in-causality/lecture/D19Ae/iv-analysis-in-r) - https://campus.datacamp.com/courses/causal-inference-with-r-instrumental-variables-rdd/instrumental-variables-in-practice?ex=14 - https://campus.datacamp.com/courses/causal-inference-with-r-instrumental-variables-rdd/instrumental-variables-in-practice?ex=18