# Idea - [[sklearn pipeline]], [[course - building effective ML workflow with scikit-learn - Kevin Markham]] - [[feature selection]] - [convert sparse to dense matrix within a pipeline](https://stackoverflow.com/questions/28384680/scikit-learns-pipeline-a-sparse-matrix-was-passed-but-dense-data-is-required) - [saving model - sklearn](https://scikit-learn.org/stable/modules/model_persistence.html) - [save sklearn model - model persistence](https://stackoverflow.com/questions/34143829/sklearn-how-to-save-a-model-created-from-a-pipeline-and-gridsearchcv-using-jobli) - [save model - kaggle tutorial](https://www.kaggle.com/prmohanty/python-how-to-save-and-load-ml-models) - [handle missing values in features with histogram-based gradient boosting models](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html) - [[sklearn one hot encoder new categories in test vs train set]] - use pipelines to perform feature selection - okay to drop rows when they are missing at random, but not okay when they're not missing at random # References - [dataschool - building effective ML workflow](https://courses.dataschool.io/courses/building-an-effective-machine-learning-workflow-with-scikit-learn/) - [19 Hidden Sklearn Features You Were Supposed to Learn The Hard Way | by Bex T. | Apr, 2022 | Towards Data Science](https://towardsdatascience.com/19-hidden-sklearn-features-you-were-supposed-to-learn-the-hard-way-5293e6ff149)