# Idea
Computational efficiency: L1 is computationally inefficient because absolute values are difficult to differentiate; L2 squares the errors, making the function easy to differentiate.
Data sparsity: L1 is faster when the data are sparse whereas L2 is faster for non-sparse data.
Feature selection: L1 performs feature selection, whereas L2 doesn't. That is, features that don't contribute to the model will have coefficient values of 0.
![[Pasted image 20210125205755.png]]
Note that by default, [[scikit-learn]]'s [[logistic regression]] uses [[L2 regularization]].
# References
- [udacity](https://www.youtube.com/watch?v=PyFNIcsNma0&feature=emb_logo)