- [[Frisch-Waugh-Lovell theorem]]
# Idea
The double-lasso IV method solves a specific, nasty problem: how do you pick which control variables and instruments to include in an IV regression when you have way more candidates than you can handle? Here's the intuition built from the ground up.
```
SYMBOLS
════════
y = sales (outcome)
d = price (endogenous)
x = controls (car features)
z = instruments (competitor characteristics)
α = causal effect of price on sales (what you want to estimate)
CLEANING (Steps 1-3, 3 Lasso separate regressions):
════════════════════════════
Step 1: d_i = x_i'γ̂ + z_i'δ̂ + residual → predicted price: d̂_i
Goal: Which controls and instruments actually predict price? Build a good prediction with Lasso (but coefficients are shrunk and biased.)
- Clean price (endogenous)
- We want to obtain d̂_i, which is used in Step 3 to decompose into a controls piece and an instruments piece (the "nudge").
- Lasso shrinks irrelevant coefficients to 0, so you don't carry noise into later stages.
Step 2: y_i = x_i'ϑ̂ + residual → unexplained sales: ρ̂ʸ = y_i − x_i'θ̂
Regress the predicted prices (from Step 1) on the controls alone.
Goal: Which controls matter for explaining the variation in the outcome?
- Clean sales (outcome)
- Residuals are the part of the outcome that controls cannot explain. It's the "cleaned" part we want.
Step 3: d̂_i = x_i'ϑ̂ + residual → residual (nudge/instrument in Step 4): v̂ = d̂_i − x_i'ϑ̂
unexplained price (endogenous variable in Step 4): ρ̂ᵈ = d_i − x_i'ϑ̂ (raw price minus predicted price)
Goal: Separate the instrument piece from the control piece for predicting the endogenous variable. Cleanly decompose the prediction made in step 1.
- Clean nudge/instrument
- Regress predicted endogenous variable, d̂_i, from controls alone.
- Of the price that Step 1 predicted, how much came from controls vs. instruments? Keeps only the instruments part (residuals).
ESTIMATION (Step 4, standard 2SLS, using ρ̂ᵈ, ρ̂ʸ, v̂):
═══════════════════════════════════
Stage 1: ρ̂ᵈ = π · v̂ + error → fitted ρ̂ᵈ
Stage 2: regress ρ̂ʸ on fitted ρ̂ᵈ → α̂ ← YOUR ANSWER
```
This is also precisely why the "double" selection helps with the immunization. If a control matters for sales but Lasso misses it in Step 2, it might still get caught in Step 3 (or vice versa). The union across multiple Lasso runs gives you more chances to catch important variables than any single Lasso would.
## Extra questions
### Step 1 vs step 3 predicted and raw values
Why do we use ρ̂ᵈ_i (raws price minus predicted price) from step 3 instead of using the version from step 1?
Because the Step 1 residual has the instrument signal **already removed**.
```
Step 1 residual: d_i − x_i'γ̂ − z_i'δ̂
controls instruments
removed removed
= noise only (instruments partialled out)
ρ̂ᵈ from Step 3: d_i − x_i'ϑ̂
controls
removed
= instrument signal + endogenous stuff + noise
```
Stage 1 needs to project ρ̂ᵈ onto v̂ to isolate the exogenous part. If you'd already stripped out the instrument signal in Step 1, there's nothing left for v̂ to grab onto:
```
Using Step 1 residuals: v̂ ──> (noise) → π ≈ 0, no power
Using ρ̂ᵈ from Step 3: v̂ ──> (signal + noise) → π picks up signal
```
You want to remove controls (so they don't confound the estimate) but **keep** the instrument variation (so you can use it for identification). That's exactly what partialling out controls only does.
### Relationship between steps 1 and 3
Step 3 depends on Step 1. Step 1 does not depend on Step 3.
```
Step 1: d_i = x_i'γ̂ + z_i'δ̂ + residual
│
d̂_i = x_i'γ̂ + z_i'δ̂ ─────┼──── this is the OUTPUT
│
▼
Step 3: d̂_i = x_i'ϑ̂ + v̂ ◄── d̂_i is the INPUT
```
Step 3 asks: "of the price that Step 1 predicted, how much came from controls vs instruments?" It can only ask that question if Step 1 has already produced d̂_i.
The reason you need both steps rather than just reading off γ̂ and δ̂ from Step 1 directly is **Lasso**. **Lasso does variable selection and prediction well, but its coefficients are biased (shrunk toward zero).** So you can't just say "γ̂ is the control part, δ̂ is the instrument part" and trust the split.
```
Why not just use Step 1's coefficients directly?
d̂_i = x_i'γ̂ + z_i'δ̂
│ │
│ └── biased (Lasso shrinkage)
└────────── biased (Lasso shrinkage)
The PREDICTION d̂_i is fine (Lasso predicts well),
but the DECOMPOSITION into γ̂ vs δ̂ is unreliable.
```
So Step 3 takes the reliable object (d̂_i as a single predicted vector) and re-decomposes it using a fresh regression, getting a clean separation:
```
Step 3: d̂_i = x_i'ϑ̂ + v̂
│ │
│ └── instrument-driven piece (what you keep)
└────────── control-driven piece (what you discard)
This split is reliable because d̂_i is a single vector
being projected onto x — no simultaneous Lasso selection
of x and z muddying the decomposition.
```
**In short:** Step 1 builds a good prediction. Step 3 cleanly decomposes it. You need both because Lasso predicts well but splits poorly.
Why isn't the issue of biased coefficients in Step 3 an issue?
Step 3 **does** have biased coefficients. ϑ̂ itself is biased. But it doesn't matter — you never use ϑ̂. You use the **residual** v̂.
```
Step 3: d̂_i = x_i'ϑ̂ + v̂
You KEEP: v̂ = d̂_i − x_i'ϑ̂ (residual)
You DISCARD: ϑ̂ (coefficients)
```
The residual from an OLS projection has a guarantee that Lasso coefficients don't: **v̂ is orthogonal to x by construction**.
```
OLS residuals are ALWAYS orthogonal to regressors:
x'v̂ = 0 ← mechanical property of OLS,
holds regardless of whether ϑ̂ is
"correct" or biased or anything
This means v̂ contains ZERO control variation.
That's all you need — not correct coefficients,
just a clean residual.
```
The same logic applies everywhere in this procedure. Steps 2 and 3 both use OLS/Lasso to partial out controls, and in both cases you only keep the residuals, never the coefficients:
```
Step 2: y on x → keep ρ̂ʸ, discard θ̂
Step 3: d̂ on x → keep v̂, discard ϑ̂
Step 1: d on x,z → keep d̂, discard γ̂, δ̂
No coefficient from any Lasso ever enters the final estimate.
Only residuals and predicted values carry forward.
```
The only coefficient you care about is α̂ in Stage 2 — and that comes from a simple OLS regression on already-cleaned variables, not from Lasso.
# References
- https://www.perplexity.ai/search/explain-the-double-lasso-instr-bKyyAd.aQOS_WMTEc5rK1w?sm=d
- https://www.perplexity.ai/search/what-s-the-point-of-the-second-4FN1n8QkRXSpW92AlIu2_A