Practical Machine Learning

Practical information

Course Info:CS 329P, 2021 Autumn, Stanford
Instructors: Qingqing Huang, Mu Li, Alex Smola
Lectures:Wed, Fri 9:45 AM - 11:15 AM
Room: 200-205
Course Forum: Ed Discussion (enrolled student only)
Grading Policy:Midterm (10%), Homework (40%), Project (50%)

Overview

Applying Machine Learning (ML) to solve real problems accurately and robustly requires more than just training the latest ML model. First, you will learn practical techniques to deal with data. This matters since real data is often not independently and identically distributed. It includes detecting covariate, concept, and label shifts, and modeling dependent random variables such as the ones in time series and graphs. Next, you will learn how to efficiently train ML models, such as tuning hyper-parameters, model combination, and transfer learning. Last, you will learn about fairness and model explainability, and how to efficiently deploy models. This class will teach both statistics, algorithms and code implementations. Homeworks and the final project emphasize solving real problems.

Prerequisites

Python programing and machine learning (CS 229), basic statistics. Eqivalent knowledge is fine, and we will try to make the class as self-contained as possible. This is a class where you need to get your hands dirty with programming.

Instructors

Qingqing Huang

Sr. Research Scientist, Google Brain

Mu Li

Sr. Principal Scientist, AWS

Alex Smola

VP/Distinguished Scientist, AWS

Lectures

The tentative schedule is listed as follows. Note that italic topics are optional, namely we may either remove them or provide self-study vidoes.

Part I: Basic ML Modeling
Date Lecture Topics
9/22 1. Data I Logistics, course introduction, data acquisition
9/24 2. Data II Web scraping, data labeling, exploratory data analysis
9/29 3. Data III Data cleaning, data transformation, feature engineering, data summary
10/1 4. ML model recap I ML overview, tree methods, linear methods
10/6 5. ML model recap II Neural networks
Assignment 1 due in
10/8 6. Model Validation Evaluation metrics, underfitting and overfitting, model validation
10/13 7. Model Combination Bias and variance, bagging, boosting, stacking
10/15 Midterm Presentation
Part II: Broken Assumptions
Date Lecture Topics
10/20 8. Covariate Shift Generalization performance recap, covariate shift
10/22 9. Covariate Shift II Covariate shift with more math, adversarial data and invariants
10/27 Midterm Exam
10/29 10. Label Shift Two sample test, label shift
Assignment 2 due in
11/3 11. Data beyond IID Independence tests, sequence models , graphs
Part III: Performance Tuning
Date Lecture Topics
11/5 12. Model Tuning Model tuning, HPO algorithms, NAS algorithms
11/10 13. Deep Network Tuning Batch and layer norms, residual connections, attention
11/12 14. Transfer Learning Fine-tuning for CV, fine tuning for NLP, prompt-based learning
11/17 15. Model Compression Pruning and quantization, knowledge distillation
Assignment 3 due in
11/19 16. Multimodal data Multimodal data
11/24 Thanksgiving Recess
11/26 Thanksgiving Recess
Part IV: Beyond the Model
Date Lecture Topics
12/1 17. Fairness Examples, law, risk distributions, criterias , in practice
12/3 18. Explainability Explainability, strategies , conditioning and backdoors, axiomatic approaches, heuristics
12/8 Final Presentation
12/10 Final Presentation

Course Format

The evaluation is as follows: midterm exam (10%), homework (40%), and project (50%). In the midterm exam, we will ask some theory questions, let you spot the mistakes in code examples, and describe modeling challenges with solutions.

There are 4 assignments. They contain questions similar to the midterm exam. But more importantly, we will ask you to write code to solve real problems with ML based on the baseline implementations we provided.

These assignments may inspire you to choose your course projects. The course project will have two presentations. In the midterm, each group will provide a 1 page summary of project progress and execution plan, and prepare 3 slides for a 5 min presentation. The final presentation will be 10min in length, and the final report is up to 6 pages in ICML style.