Practical Machine Learning¶

Practical information¶

Course Info:	CS 329P, 2021 Autumn, Stanford
Instructors:	Qingqing Huang, Mu Li, Alex Smola
Lectures:	Wed, Fri 9:45 AM - 11:15 AM
Room:	200-205
Course Forum:	Ed Discussion (enrolled student only)
Grading Policy:	Midterm (10%), Homework (40%), Project (50%)

Overview¶

Applying Machine Learning (ML) to solve real problems accurately and robustly requires more than just training the latest ML model. First, you will learn practical techniques to deal with data. This matters since real data is often not independently and identically distributed. It includes detecting covariate, concept, and label shifts, and modeling dependent random variables such as the ones in time series and graphs. Next, you will learn how to efficiently train ML models, such as tuning hyper-parameters, model combination, and transfer learning. Last, you will learn about fairness and model explainability, and how to efficiently deploy models. This class will teach both statistics, algorithms and code implementations. Homeworks and the final project emphasize solving real problems.

Prerequisites¶

Python programing and machine learning (CS 229), basic statistics. Eqivalent knowledge is fine, and we will try to make the class as self-contained as possible. This is a class where you need to get your hands dirty with programming.

Instructors¶

Qingqing Huang

Sr. Research Scientist, Google Brain

Mu Li

Sr. Principal Scientist, AWS

Alex Smola

VP/Distinguished Scientist, AWS

Lectures¶

The tentative schedule is listed as follows. Note that italic topics are optional, namely we may either remove them or provide self-study vidoes.

Part I: Basic ML Modeling
Date	Lecture	Topics
9/22	1. Data I	Logistics, course introduction, data acquisition
9/24	2. Data II	Web scraping, data labeling, exploratory data analysis
9/29	3. Data III	Data cleaning, data transformation, feature engineering, data summary
10/1	4. ML model recap I	ML overview, tree methods, linear methods
10/6	5. ML model recap II	Neural networks
Assignment 1 due in
10/8	6. Model Validation	Evaluation metrics, underfitting and overfitting, model validation
10/13	7. Model Combination	Bias and variance, bagging, boosting, stacking
10/15	Midterm Presentation
Part II: Broken Assumptions
Date	Lecture	Topics
10/20	8. Covariate Shift	Generalization performance recap, covariate shift
10/22	9. Covariate Shift II	Covariate shift with more math, adversarial data and invariants
10/27	Midterm Exam
10/29	10. Label Shift	Two sample test, label shift
Assignment 2 due in
11/3	11. Data beyond IID	Independence tests, sequence models , graphs
Part III: Performance Tuning
Date	Lecture	Topics
11/5	12. Model Tuning	Model tuning, HPO algorithms, NAS algorithms
11/10	13. Deep Network Tuning	Batch and layer norms, residual connections, attention
11/12	14. Transfer Learning	Fine-tuning for CV, fine tuning for NLP, prompt-based learning
11/17	15. Model Compression	Pruning and quantization, knowledge distillation
Assignment 3 due in
11/19	16. Multimodal data	Multimodal data
11/24	Thanksgiving Recess
11/26	Thanksgiving Recess
Part IV: Beyond the Model
Date	Lecture	Topics
12/1	17. Fairness	Examples, law, risk distributions, criterias , in practice
12/3	18. Explainability	Explainability, strategies , conditioning and backdoors, axiomatic approaches, heuristics
12/8	Final Presentation
12/10	Final Presentation

Course Format¶

The evaluation is as follows: midterm exam (10%), homework (40%), and project (50%). In the midterm exam, we will ask some theory questions, let you spot the mistakes in code examples, and describe modeling challenges with solutions.

There are 4 assignments. They contain questions similar to the midterm exam. But more importantly, we will ask you to write code to solve real problems with ML based on the baseline implementations we provided.

These assignments may inspire you to choose your course projects. The course project will have two presentations. In the midterm, each group will provide a 1 page summary of project progress and execution plan, and prepare 3 slides for a 5 min presentation. The final presentation will be 10min in length, and the final report is up to 6 pages in ICML style.