Logo
MOOC

Machine learning in Python with scikit-learn

Build predictive models with scikit-learn and gain a practical understanding of the strengths and limitations of machine learning!

Ouvert

18 mai 2021

🇬🇧

English

CC BY 4.0

Course description

Predictive modeling is a pillar of modern data science. In this field, scikit-learn is a central tool: it is easily accessible, yet powerful, and naturally dovetails in the wider ecosystem of data-science tools based on the Python programming language.

This course is an in-depth introduction to predictive modeling with scikit-learn. Step-by-step and didactic lessons introduce the fundamental methodological and software tools of machine learning, and is as such a stepping stone to more advanced challenges in artificial intelligence, text mining, or data science.

The course is more than a cookbook: it will teach you to be critical about each step of the design of a predictive modeling pipeline: from choices in data preprocessing, to choosing models, gaining insights on their failure modes and interpreting their predictions.

The training will be essentially practical, focusing on examples of applications with code executed by the participants.

Course objectives

  • Grasp the fundamental concepts of machine learning
  • Build a predictive modeling pipeline with scikit-learn
  • Develop intuitions behind machine learning models from linear models to gradient-boosted decision trees
  • Evaluate the statistical performance of your models

Who is this course for?

The course aims to be accessible without a strong technical background. The requirements for this course are:

  • basic knowledge of Python programming : defining variables, writing functions, importing modules
  • some prior experience with the NumPy, pandas and Matplotlib libraries is recommended but not required

Course outline

  • Module 1. The Predictive Modeling Pipeline
  • Module 2. Selecting the best model
  • Module 3. Hyperparameters tuning
  • Module 4. Linear Models
  • Module 5. Decision tree models
  • Module 6. Ensemble of models
  • Module 7. Evaluating model performance

Pedagogical team

The authors of the course are scikit-learn core developers. Authors:

  • Arturo Amor, engineer, Inria
  • Loïc Estève, scikit-learn core developer, Inria
  • Olivier Grisel, scikit-learn core developer, Inria
  • Guillaume Lemaître, scikit-learn core developer, Inria
  • Thomas Schmitt, machine Learning Engineer, Inria
  • Gaël Varoquaux, research director, project manager for the scikit-learn consortium, Inria

Pedagogical support:

  • Laurence Farhi, learning engineer, Inria Learning Lab
  • Marie Collin, learning engineer, Inria Learning Lab
  • Benoit Rospars, IT engineer, Inria Learning Lab

Additional resources

Voir le cours