- [[confounding variable]] # Idea Good control variables or covariates help to lower variability (explains away a lot of variance), making it easier to find treatment effects. Generally, always include [[confounding variable|confounders]] and variables that are good predictors of $Y$ (outcome) in your model. Always exclude variables that are good predictors of only $T$ (treatment), mediators between the treatment and outcome or common effect of the treatment and outcome. ## Good covariates If a control variable is a good predictor of the outcome, it will explain away a lot of the variance in the outcome. In the [[directed acyclic graphs|DAG]] below, $X$ is a good control variable. ![[s20220727_010843.png]] If covariates are [[confounding variable|confounders]] (cause both treatment and outcome), we **must** control for them. ## Harmful covariates Adding bad covariates to the model can improve predictive validity but hurt causal identification. Avoid adding controls that are just good predictors of treatment, because they increase the variance of the parameter estimates. Below, control for `severity` (it's a confounder) but not `hospital` (because it's not a confounder and it predicts only treatment but not outcome). ![[s20220727_012443.png]] ## Downright bad covariates Never control for covariates that can cause a [[bias|selection bias]]. These variables are ones that are common effects (e.g., `payments`) or mediators (i.e., variables between the path from cause to effect, e.g., `opened` or `agreement`). ![[s20220727_225142.png]] # References - [07 - Beyond Confounders — Causal Inference for the Brave and True](https://matheusfacure.github.io/python-causality-handbook/07-Beyond-Confounders.html)