L1 regularization

- [[L2 regularization]], [[elastic net regularization]], [[L1 vs L2 regularization]], [[feature selection]] # Idea For [[linear regression]] with [[L1 regularization]], it's called **Lasso**. But for other models like [[logistic regression]], we simply say L1. L1 regularization sums the absolute values of the coefficients and adds the result to the error. ## Feature selection L1 penalty term and [[feature selection]]. See [[lasso feature selection]]. $ \lambda \Sigma\left|\beta_i\right| $ Lasso shrinks coefficients to 0 because of the [[constant derivative effect]]. The magnitude of the pull toward 0 is the same for all coefficients, regardless of their size. Imagine each coefficient is being pulled toward zero by a constant force of strength λ. If the data provides enough evidence for a coefficient to be non-zero, it will resist this pull but still be shrunk somewhat. If the evidence is weak, the constant pull will eventually drag the coefficient all the way to zero. The [[constant derivative effect]] enables principled feature selection. ![[Pasted image 20210118232454.png|600]] # References - https://campus.datacamp.com/courses/linear-classifiers-in-python/logistic-regression-3?ex=1 - [feature selection with sklearn L1](https://campus.datacamp.com/courses/linear-classifiers-in-python/logistic-regression-3?ex=3) - [udacity course](https://www.youtube.com/watch?v=PyFNIcsNma0&feature=emb_logo - [Perplexity](https://www.perplexity.ai/search/how-does-lasso-regression-shri-5ixjkZrNQlKVVp9a4UZwuA) - [linear model - Why lasso for feature selection? - Cross Validated](https://stats.stackexchange.com/questions/367155/why-lasso-for-feature-selection)