Scikit-learn Cookbook Second Edition over 80 recipes for machine learning – Julian Avila & Trent Hauck

Chapter 1, High-Performance Machine Learning – NumPy, features your first machine learning algorithm with support vector machines. We distinguish between classification (what type?) and regression (how much?). We predict an outcome on data we have not seen.

Chapter 2, Pre-Model Workflow and Pre-Processing, exposes a realistic industrial setting with plenty of data munging and pre-processing. To do machine learning, you need good data, and this chapter tells you how to get it and get it into good form for machine learning.

Chapter 3, Dimensionality Reduction, discusses reducing the number of features to simplify machine learning and allow better use of computational resources.

Chapter 4, Linear Models with scikit-learn, tells the story of linear regression, the oldest predictive model, from the machine learning and artificial intelligence lenses. You deal with correlated features with ridge regression, eliminate related features with LASSO and cross- validation, or eliminate outliers with robust median-based regression.

Chapter 5, Linear Models – Logistic Regression, examines the important healthcare datasets for cancer and diabetes with logistic regression. This model highlights both similarities and differences between regression and classification, the two types of supervised learning.

Chapter 6, Building Models with Distance Metrics, places points in your familiar Euclidean space of school geometry, as distance is synonymous with similarity. How close (similar) or far away are two points? Can we group them together? With Euclid’s help, we can approach unsupervised learning with k-means clustering and place points in categories we do not know in advance.

Chapter 7, Cross-Validation and Post-Model Workflow, features how to select a model that works well with cross-validation: iterated training and testing of predictions. We also save computational work with the pickle module.

Chapter 8, Support Vector Machines, examines in detail the support vector machine, a powerful and easy-to-understand algorithm.

Chapter 9, Tree Algorithms and Ensembles, features the algorithms of decision making: decision trees. This chapter introduces meta-learning algorithms, diverse algorithms that vote in some fashion to increase overall predictive accuracy.

Chapter 10, Text and Multiclass Classification with scikit-learn, reviews the basics of natural language processing with the simple bag-of-words model. In general, we view classification with three or more categories.

Chapter 11, Neural Networks, introduces a neural network and perceptrons, the components of a neural network. Each layer figures out a step in a process, leading to a desired outcome. As we do not program any steps specifically, we venture into artificial intelligence. Save the neural network so that you can keep training it later, or load it and utilize it as part of a stacking ensemble.

Chapter 12, Create a Simple Estimator, helps you make your own scikit-learn estimator, which you can contribute to the scikit-learn community and take part in the evolution of data science with scikit-learn.