# Idea A method to prevent [[overfitting]]. We construct many models. To bootstrap a data set, we create multiple data sets of equal size by randomly drawing data points from the original data. The points are drawn with replacement--after we draw a data point, we put it back in the "bag" so that we might draw it again. This technique produces a collection of data sets of equal size, each of which contains multiple copies of some data points and no copies of others. We then fit (nonlinear) models to each data set, resulting in multiple models. Bagging will capture robust nonlinear effects, as they will be evident in multiple random samples of the data, while avoiding fitting idiosyncratic patterns in any single data set. By building diversity through random samples and then averaging the many models, bagging applies the logic that underpins the [[diversity prediction theorem]]. It creates diverse models, and as we know, the average of those models will be more accurate than the models themselves. However, remember that [[the dimensionality of our data limits the number of models we can produce]]. # References - [Why Bagging is So Ridiculously Effective At Variance Reduction?](https://www.dailydoseofds.com/why-bagging-is-so-ridiculously-effective-at-variance-reduction/)