CS 839, Fall 2025
Department of Computer Sciences
University of Wisconsin–Madison
Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Gemini have astonishing abilities to answer questions, speak with users, and generate sophisticated art. This course surveys how these systems are built and used: data curation, tokenization, architectures (Transformers and subquadratic variants), pretraining objectives, scaling laws, and efficient training/inference. We then study post-training and reasoning: prompting and in-context learning, chain-of-thought and verification, preference-based learning (RLHF/DPO/RLAIF) and reinforcement approaches like RL with verifiable rewards. We will have the opportunity to study agents, fine-tuning and adaptation (e.g., LoRA/quantization), multimodal models, evaluation and LLM-as-a-judge, deployment and safety (robustness, privacy, and red-teaming), and societal impacts. Familiarity with basic machine learning is assumed.
Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.