Machine learning in Python with scikit-learn
Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!
18 mai 2021
English
CC BY 4.0
Course description
Predictive modeling is a pillar of modern data science. In this field, scikit-learn is a central tool: it is easily accessible, yet powerful, and naturally dovetails in the wider ecosystem of data-science tools based on the Python programming language.
This course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.
The course is more than a cookbook: it will teach you to be critical about each step of the design of a predictive modeling pipeline: from choices in data preprocessing, to choosing models, gaining insights on their failure modes and interpreting their predictions.
The training will be essentially practical, focusing on examples of applications with code executed by the participants.
Course objectives
- Grasp the fundamental concepts of machine learning
- Build a predictive modeling pipeline with scikit-learn
- Develop intuitions behind machine learning models from linear models to gradient-boosted decision trees
- Evaluate the statistical performance of your models
Who is this course for?
The course aims to be accessible without a strong technical background. The requirements for this course are:
- basic knowledge of Python programming : defining variables, writing functions, importing modules
- some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required
Course outline
- Module 1. The Predictive Modeling Pipeline
- Module 2. Selecting the best model
- Module 3. Hyperparameters tuning
- Module 4. Linear Models
- Module 5. Decision tree models
- Module 6. Ensemble of models
- Module 7. Evaluating model performance
Pedagogical team
The authors of the course are scikit-learn core developers. Authors:
- Arturo Amor, engineer, Inria
- Loïc Estève, scikit-learn core developer, Inria
- Olivier Grisel, scikit-learn core developer, Inria
- Guillaume Lemaître, scikit-learn core developer, Inria
- Thomas Schmitt, machine Learning Engineer, Inria
- Gaël Varoquaux, research director, project manager for the scikit-learn consortium, Inria
Pedagogical support:
- Laurence Farhi, learning engineer, Inria Learning Lab
- Marie Collin, learning engineer, Inria Learning Lab
- Benoit Rospars, IT engineer, Inria Learning Lab
Additional resources
- All the course materials are available at: https://inria.github.io/scikit-learn-mooc/