regularization

- [[regularized logistic regression]] - [[L1 regularization]], [[L2 regularization]], [[elastic net regularization]], # Idea Regularizing helps to prevent [[overfitting]]. It penalizes more complex models, which tend to generalize worse than simpler models. It is directly related to the [[bias-variance tradeoff]]. Regularizing reduces the size of the parameters/coefficients, reduces [[training accuracy]], but tends to increase [[testing accuracy]]. The effects of regularizing is that [[decision value - machine learning classifier output|decision values]] (e.g., predicted probability of a given class) will be must closer to chance, which means the model is less confident in its predictions. We can control the amount of regularization by changing a parameter. In `sklearn`, it's the [alpha parameter](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html), which is the $\lambda$ parameter. If it is 0, there is no regularization. If it is large and positive, all parameters will be 0 (leaving only $b$, constant, if we don't regularize this parameter), resulting in an extremely simple and biased model. If it is negative, the model become complicated and have high variance. ![[20231224125342.png]] ![[20231224125947.png]] Note that [[feature scaling]] is highly recommended when regularization is used. Regularization can be applied to regression and classification problems. ![[20231224131512.png]] ## Linear regression The equation for the cost function regularized linear regression is: $J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \tag{1}$ where: $ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \tag{2} $ Compare this to the cost function without regularization (which you implemented in a previous lab), which is of the form: $J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 $ The difference is the regularization term, $\frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2$ Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice. ![[20231224130448.png]] ![[20231224130536.png]] ![[20231224130831.png]] ![[20231224131001.png]] ## Logistic regression For regularized **logistic** regression, the cost function is of the form $J(\mathbf{w},b) = \frac{1}{m} \sum_{i=0}^{m-1} \left[ -y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) \right] + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 \tag{3}$ where: $ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = sigmoid(\mathbf{w} \cdot \mathbf{x}^{(i)} + b) \tag{4} $ Compare this to the cost function without regularization (which you implemented in a previous lab): $ J(\mathbf{w},b) = \frac{1}{m}\sum_{i=0}^{m-1} \left[ (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)\right] $ As was the case in linear regression above, the difference is the regularization term, which is $\frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2$. Including this term encourages gradient descent to minimize the size of the parameters. Note, in this example, the parameter $b$ is not regularized. This is standard practice. ![[20231224131245.png]] ![[20231224131337.png]] ## Underfitting and overfitting When the algorithm underfits, it has high bias. When it overfits, it has high variance. The terms bias and variance are derived from the [[bias-variance tradeoff]]. # References - https://campus.datacamp.com/courses/linear-classifiers-in-python/logistic-regression-3?ex=1 - https://www.youtube.com/watch?v=PyFNIcsNma0&feature=emb_logo - [cohen udemy regularization](https://www.udemy.com/course/deeplearning_x/learn/lecture/27844198#content) - [The problem of overfitting - Week 3: Classification | Coursera](https://www.coursera.org/learn/machine-learning/lecture/erGPe/the-problem-of-overfitting) - [Addressing overfitting - Week 3: Classification | Coursera](https://www.coursera.org/learn/machine-learning/lecture/HvDkF/addressing-overfitting) - [Cost function with regularization - Week 3: Classification | Coursera](https://www.coursera.org/learn/machine-learning/lecture/UZTPk/cost-function-with-regularization) - [Regularized linear regression - Week 3: Classification | Coursera](https://www.coursera.org/learn/machine-learning/lecture/WRULa/regularized-linear-regression) - [Regularization and bias/variance - Advice for applying machine learning | Coursera](https://www.coursera.org/learn/advanced-learning-algorithms/lecture/JQZRO/regularization-and-bias-variance)