Foundation Models

CS 839, Fall 2025
Department of Computer Sciences
University of Wisconsin–Madison


Logistics

Course Description

Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Gemini have astonishing abilities to answer questions, speak with users, and generate sophisticated art. This course surveys how these systems are built and used: data curation, tokenization, architectures (Transformers and subquadratic variants), pretraining objectives, scaling laws, and efficient training/inference. We then study post-training and reasoning: prompting and in-context learning, chain-of-thought and verification, preference-based learning (RLHF/DPO/RLAIF) and reinforcement approaches like RL with verifiable rewards. We will have the opportunity to study agents, fine-tuning and adaptation (e.g., LoRA/quantization), multimodal models, evaluation and LLM-as-a-judge, deployment and safety (robustness, privacy, and red-teaming), and societal impacts. Familiarity with basic machine learning is assumed.

Prerequisites

Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.

Announcements

  • The Piazza code is introtofm.
  • Welcome! Please see the syllabus page for details about the class and the schedule page to get a sense of the material covered in this class.