# Idea
- [[sklearn pipeline]], [[course - building effective ML workflow with scikit-learn - Kevin Markham]]
- [[feature selection]]
- [convert sparse to dense matrix within a pipeline](https://stackoverflow.com/questions/28384680/scikit-learns-pipeline-a-sparse-matrix-was-passed-but-dense-data-is-required)
- [saving model - sklearn](https://scikit-learn.org/stable/modules/model_persistence.html)
- [save sklearn model - model persistence](https://stackoverflow.com/questions/34143829/sklearn-how-to-save-a-model-created-from-a-pipeline-and-gridsearchcv-using-jobli)
- [save model - kaggle tutorial](https://www.kaggle.com/prmohanty/python-how-to-save-and-load-ml-models)
- [handle missing values in features with histogram-based gradient boosting models](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html)
- [[sklearn one hot encoder new categories in test vs train set]]
- use pipelines to perform feature selection
- okay to drop rows when they are missing at random, but not okay when they're not missing at random
# References
- [dataschool - building effective ML workflow](https://courses.dataschool.io/courses/building-an-effective-machine-learning-workflow-with-scikit-learn/)
- [19 Hidden Sklearn Features You Were Supposed to Learn The Hard Way | by Bex T. | Apr, 2022 | Towards Data Science](https://towardsdatascience.com/19-hidden-sklearn-features-you-were-supposed-to-learn-the-hard-way-5293e6ff149)