Foundation Models

CS 839, Fall 2025
Department of Computer Sciences
University of Wisconsin–Madison


Logistics

  • Time: TR 9:30 AM - 10:45 AM
  • Location: 2532 Morgridge Hall
  • Instructor: Frederic Sala
  • Instructor Office Hours: Thursdays 2:30 pm - 4:00 pm
  • Instructor Office Location: 5514 Morgridge Hall
  • TA: Changho Shin
  • TA Office Hours: Mondays 1:00 pm - 2:00 pm
  • TA Office Location: 5548 Morgridge Hall
  • Piazza: here
  • Homework Submission: Canvas

Note: for email, please put [CS839] in the subject title. Thanks!

Course Description

Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Stable Diffusion have astonishing abilities to answer questions, speak with users, and generate sophisticated art---all without any additional training. This course covers all aspects of these fascinating models. We will learn how such models are built, including data acquisition, selecting model architectures, and pretraining approaches. Next, a significant focus is how to use and deploy foundation models, including prompting strategies, providing in-context examples, fine-tuning, integrating into existing data science pipelines, and more. We discuss recent advances that improve foundation models, such as large-scale human feedback. Finally, we cover the potential societal impacts of these models. Familiarity with basic machine learning is assumed.

This course will have two parts. In the first part, we will cover

  • Building blocks: tokenization; transformers/attention; emerging novel architectures
  • Model families & modalities: encoder-only, encoder–decoder, decoder-only; introduction to multimodal models
  • Pretraining & data: training objectives, data, scaling laws
  • Using foundation models: prompting and in-context learning; parameter-efficient tuning (e.g., LoRA/QLoRA/DoRA) and quantization
  • Efficiency: training (parallelism, memory) and inference (speculative/assisted decoding, caching)
  • Post-training & alignment: RLHF, DPO/RLAIF, and reinforcement methods (e.g., verifiable rewards)
  • Reasoning: chain-of-thought, self-consistency, program/tree-of-thought, verification/process supervision
  • Agents: tool use, planning & memory, training & evaluation in interactive environments
  • Evaluation, safety, and deployment: benchmarks & LLM-as-a-judge, robustness/privacy/red-teaming, societal impacts
In the second part, we use our new skills to read and understand cutting-edge papers in this field. Students will break intro groups and present papers, focusing on understanding and explaining the key results, and comparing them with other works.

A sampling of the papers we will read, understand, and present in this course can be found on the schedule page.

Prerequisites

Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.

Grading

The grading for the course will be be based on (tentative, subject to change):

  • Homework Assignments (3 anticipated): 30%
  • Class Presentations: 30%
  • Final Project: 40%

Project And Presentation Policies

The presentations and projects will be done in groups of about 3-6 students. Both will include proposals and check-ins with the instructor. More details will be presented during class.

The goal of the presentation is for students to read and present a cutting-edge paper on foundation models/large language models/large pretrained models. Very likely these will be papers published during the timeframe of the class itself---this is a fast-moving field! Students will summarize and discuss the key advancements in the paper, its connection to the literature covered in class, and how the ideas are validated.

The goal of the project is to identify a suitable problem in understanding, applying, or extending foundation models and to propose and validate an idea tackling this problem. In the ideal case, the resulting project will be a starting point for a submission at a top-tier machine learning conference. Note: compute limitations are likely to be a factor; discuss plans with the instructor well in advance :)

General Homework Policies and Academic Misconduct

All homework assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the Academic Misconduct Guide for Students).