CS 839 Advanced Machine Learning Systems, Spring 2022

This course will cover a wide range of topics related to the design and implementation of Systems for Machine Learning including efficient model training, inference and specialized systems designed for graph learning, recommendation systems etc.

Course Learning Objectives

At the end of the course you will be able to

Critique and evaluate the design details of state-of-the-art machine learning systems
Develop and utilize tools to profile and understand the performacne of machine learning systems
Propose new research ideas in topics related to machine learning systems
Design and implement new machine learning systems.

Logistics

Course Number: CS 839, Spring 2022, UW Madison
Instructor: Shivaram Venkataraman
Time: Tuesday and Thursday, 1:00PM - 2:15PM
Location: 1325 CS
Office hours: TBD, CS 7367
Discussion: We will be using Piazza for outside-class Q&A and to discuss papers. The system is highly catered to getting you help fast and efficiently from classmates, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza.
Text: There is no required text for this course. The lectures will be based on discussing research papers.

Pre-requisites

Course prerequisites: The prerequisites for this course are Advanced Operating Systems (CS 736) or Big Data Systems (CS 744), or equivalent courses.

Grading

Paper Reviews: 10%
Class Participation: 10%
Paper presentation: 20%
Assignments (10% each): 20%
Final Project (in groups): 40%

Schedule

</td>

Class Date	Reading	Lecture Material	Notes
1/25	None	Slides Slides+Notes	Sign up for Piazza!
	Compute
1/27	The GPU Computing Era (Shivaram)
2/1	cuDNN paper, Tutorial \| Algorithms (Optional)
2/3	Autograd and JAX
2/8	Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
2/10	Continuous profiling: where have all the cycles gone? \| nvprof tutorial (optional)		Assignment 1 released
	Communication
2/15	Collective Communication: theory, practice, and experience (Shivaram)		Submit project title / abstract
2/17	A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
2/22	Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks
2/24	Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
3/1	Project Proposal Presentations
3/3	Magpie \| NCCL Talk		Assignment 2 released. Assignment 1 due
	Serving
3/8	SEDA: An Architecture for Well-Conditioned, Scalable Internet Services (Shivaram)
3/10	Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
3/15	Spring Break
3/17	Spring Break
3/22	A Hardware-Software Blueprint for Flexible Deep Learning Specialization Automatic Generation of High-Performance Quantized Machine Learning Kernels (Optional)
3/24	A Tensor Compiler for Unified Machine Learning Prediction Serving
	Hyperparameter Tuning, Scheduling
3/29	Omega: flexible, scalable schedulers for large compute clusters (Shivaram)
3/31	A System for Massively Parallel Hyperparameter Tuning RubberBand: Cloud-based Hyperparameter Tuning (Optional)
~~4/7~~4/5	Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine		Guest Lecture
~~4/5~~4/7	Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
	Applications
4/12	Understanding Training Efficiency of Deep Learning Recommendation Models at Scale Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training(Optional)
4/14	Data Movement Is All You Need: A Case Study on Optimizing Transformers
4/19	DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
4/21	Reinforcement Learning: RLScope
4/26	Summary
4/28	Project Presentation
5/3	Project Presentation
5/5	Project Presentation