Practical Machine Learning¶
Practical information¶
Course Info: | CS 329P, 2021 Autumn, Stanford |
Instructors: | Qingqing Huang, Mu Li, Alex Smola |
Lectures: | Wed, Fri 9:45 AM - 11:15 AM |
Room: | 200-205 |
Course Forum: | Ed Discussion (enrolled student only) |
Grading Policy: | Midterm (10%), Homework (40%), Project (50%) |
Overview¶
Applying Machine Learning (ML) to solve real problems accurately and robustly requires more than just training the latest ML model. First, you will learn practical techniques to deal with data. This matters since real data is often not independently and identically distributed. It includes detecting covariate, concept, and label shifts, and modeling dependent random variables such as the ones in time series and graphs. Next, you will learn how to efficiently train ML models, such as tuning hyper-parameters, model combination, and transfer learning. Last, you will learn about fairness and model explainability, and how to efficiently deploy models. This class will teach both statistics, algorithms and code implementations. Homeworks and the final project emphasize solving real problems.
Prerequisites¶
Python programing and machine learning (CS 229), basic statistics. Eqivalent knowledge is fine, and we will try to make the class as self-contained as possible. This is a class where you need to get your hands dirty with programming.
Instructors¶
Qingqing Huang
Sr. Research Scientist, Google Brain
Mu Li
Sr. Principal Scientist, AWS
Alex Smola
VP/Distinguished Scientist, AWS
Lectures¶
The tentative schedule is listed as follows. Note that italic topics are optional, namely we may either remove them or provide self-study vidoes.
Part I: Basic ML Modeling | ||
Date | Lecture | Topics |
9/22 | 1. Data I | Logistics, course introduction, data acquisition |
9/24 | 2. Data II | Web scraping, data labeling, exploratory data analysis |
9/29 | 3. Data III | Data cleaning, data transformation, feature engineering, data summary |
10/1 | 4. ML model recap I | ML overview, tree methods, linear methods |
10/6 | 5. ML model recap II | Neural networks |
Assignment 1 due in | ||
10/8 | 6. Model Validation | Evaluation metrics, underfitting and overfitting, model validation |
10/13 | 7. Model Combination | Bias and variance, bagging, boosting, stacking |
10/15 | Midterm Presentation | |
Part II: Broken Assumptions | ||
Date | Lecture | Topics |
10/20 | 8. Covariate Shift | Generalization performance recap, covariate shift |
10/22 | 9. Covariate Shift II | Covariate shift with more math, adversarial data and invariants |
10/27 | Midterm Exam | |
10/29 | 10. Label Shift | Two sample test, label shift |
Assignment 2 due in | ||
11/3 | 11. Data beyond IID | Independence tests, sequence models , graphs |
Part III: Performance Tuning | ||
Date | Lecture | Topics |
11/5 | 12. Model Tuning | Model tuning, HPO algorithms, NAS algorithms |
11/10 | 13. Deep Network Tuning | Batch and layer norms, residual connections, attention |
11/12 | 14. Transfer Learning | Fine-tuning for CV, fine tuning for NLP, prompt-based learning |
11/17 | 15. Model Compression | Pruning and quantization, knowledge distillation |
Assignment 3 due in | ||
11/19 | 16. Multimodal data | Multimodal data |
11/24 | Thanksgiving Recess | |
11/26 | Thanksgiving Recess | |
Part IV: Beyond the Model | ||
Date | Lecture | Topics |
12/1 | 17. Fairness | Examples, law, risk distributions, criterias , in practice |
12/3 | 18. Explainability | Explainability, strategies , conditioning and backdoors, axiomatic approaches, heuristics |
12/8 | Final Presentation | |
12/10 | Final Presentation |
Course Format¶
The evaluation is as follows: midterm exam (10%), homework (40%), and project (50%). In the midterm exam, we will ask some theory questions, let you spot the mistakes in code examples, and describe modeling challenges with solutions.
There are 4 assignments. They contain questions similar to the midterm exam. But more importantly, we will ask you to write code to solve real problems with ML based on the baseline implementations we provided.
These assignments may inspire you to choose your course projects. The course project will have two presentations. In the midterm, each group will provide a 1 page summary of project progress and execution plan, and prepare 3 slides for a 5 min presentation. The final presentation will be 10min in length, and the final report is up to 6 pages in ICML style.