# Idea Computational efficiency: L1 is computationally inefficient because absolute values are difficult to differentiate; L2 squares the errors, making the function easy to differentiate. Data sparsity: L1 is faster when the data are sparse whereas L2 is faster for non-sparse data. Feature selection: L1 performs feature selection, whereas L2 doesn't. That is, features that don't contribute to the model will have coefficient values of 0. ![[Pasted image 20210125205755.png]] Note that by default, [[scikit-learn]]'s [[logistic regression]] uses [[L2 regularization]]. # References - [udacity](https://www.youtube.com/watch?v=PyFNIcsNma0&feature=emb_logo)