CS 839 Advanced Machine Learning Systems, Spring 2022

This course will cover a wide range of topics related to the design and implementation of Systems for Machine Learning including efficient model training, inference and specialized systems designed for graph learning, recommendation systems etc.

Course Learning Objectives

At the end of the course you will be able to

Logistics

Pre-requisites

Course prerequisites: The prerequisites for this course are Advanced Operating Systems (CS 736) or Big Data Systems (CS 744), or equivalent courses.

Grading

Schedule

</td>
Class Date Reading Lecture Material Notes
1/25 None Slides Slides+Notes Sign up for Piazza!
Compute
1/27 The GPU Computing Era (Shivaram)
2/1 cuDNN paper, Tutorial | Algorithms (Optional)
2/3 Autograd and JAX
2/8 Rammer: Enabling Holistic Deep Learning Compiler Optimizations with rTasks
2/10 Continuous profiling: where have all the cycles gone? | nvprof tutorial (optional) Assignment 1 released
Communication
2/15 Collective Communication: theory, practice, and experience (Shivaram) Submit project title / abstract
2/17 A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters
2/22 Accelerating Collective Communication in Data Parallel Training across Deep Learning Frameworks
2/24 Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
3/1 Project Proposal Presentations
3/3 Magpie | NCCL Talk Assignment 2 released. Assignment 1 due
Serving
3/8 SEDA: An Architecture for Well-Conditioned, Scalable Internet Services (Shivaram)
3/10 Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
3/15 Spring Break
3/17 Spring Break
3/22 A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Automatic Generation of High-Performance Quantized Machine Learning Kernels (Optional)
3/24 A Tensor Compiler for Unified Machine Learning Prediction Serving
Hyperparameter Tuning, Scheduling
3/29 Omega: flexible, scalable schedulers for large compute clusters (Shivaram)
3/31 A System for Massively Parallel Hyperparameter Tuning
RubberBand: Cloud-based Hyperparameter Tuning (Optional)
4/74/5 Marius++: Large-Scale Training of Graph Neural Networks on a Single Machine Guest Lecture
4/54/7 Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads
Applications
4/12 Understanding Training Efficiency of Deep Learning Recommendation Models at Scale
Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training(Optional)
4/14 Data Movement Is All You Need: A Case Study on Optimizing Transformers
4/19 DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
4/21 Reinforcement Learning: RLScope
4/26 Summary
4/28 Project Presentation
5/3 Project Presentation
5/5 Project Presentation