- [[L2 regularization]], [[elastic net regularization]], [[L1 vs L2 regularization]], [[feature selection]]
# Idea
For [[linear regression]] with [[L1 regularization]], it's called **Lasso**. But for other models like [[logistic regression]], we simply say L1.
L1 regularization sums the absolute values of the coefficients and adds the result to the error.
## Feature selection
L1 penalty term and [[feature selection]]. See [[lasso feature selection]].
$
\lambda \Sigma\left|\beta_i\right|
$
Lasso shrinks coefficients to 0 because of the [[constant derivative effect]]. The magnitude of the pull toward 0 is the same for all coefficients, regardless of their size. Imagine each coefficient is being pulled toward zero by a constant force of strength λ. If the data provides enough evidence for a coefficient to be non-zero, it will resist this pull but still be shrunk somewhat. If the evidence is weak, the constant pull will eventually drag the coefficient all the way to zero.
The [[constant derivative effect]] enables principled feature selection.
![[Pasted image 20210118232454.png|600]]
# References
- https://campus.datacamp.com/courses/linear-classifiers-in-python/logistic-regression-3?ex=1
- [feature selection with sklearn L1](https://campus.datacamp.com/courses/linear-classifiers-in-python/logistic-regression-3?ex=3)
- [udacity course](https://www.youtube.com/watch?v=PyFNIcsNma0&feature=emb_logo
- [Perplexity](https://www.perplexity.ai/search/how-does-lasso-regression-shri-5ixjkZrNQlKVVp9a4UZwuA)
- [linear model - Why lasso for feature selection? - Cross Validated](https://stats.stackexchange.com/questions/367155/why-lasso-for-feature-selection)