Foundation Models

CS 639, Spring 2026
Department of Computer Sciences
University of Wisconsin–Madison


Logistics

Note: for email, please put [CS639] in the subject title. Thanks!

Course Description

Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like the GPT family, Claude, and Gemini have astonishing abilities to answer questions, speak with users, and generate sophisticated content. This course covers all aspects of these fascinating models. We will start with a broad review/introduction to modern neural networks and artificial intelligence. We will then learn how foundation models are built, including model architectures, pre-training, post-training, and adaptation. Next, a significant focus is how to use and deploy foundation models, including prompting strategies, providing in-context examples, fine-tuning, integrating into existing data science pipelines, and more. We discuss recent advances in applying foundation and large language models, including reasoning, agents, and systems built on top of LLMs. Finally, we cover the potential societal impacts of these models.

This course will cover the following topics:

  • Short introduction to machine learning & neural networks: supervised learning, self-supervised learning, deep neural networks, training, evaluation
  • Building blocks: transformers and attention, emerging non-transformers architectures
  • Model families & modalities: encoders, decoders, multimodal components
  • Pretraining & data: training objectives, data, scaling laws
  • Post-training, alignment, & reasoning: RLHF and variants, chain-of-thought, RL with verifiable rewards for reasoning
  • Using foundation models: prompting and in-context learning, parameter-efficient tuning, quantization, RAG
  • Agents: tool use, planning and memory, training and evaluation in interactive environments
  • Evaluation, safety, and deployment: benchmarks & LLM-as-a-judge, robustness/privacy/red-teaming, societal impacts

A sampling of the papers we will read, understand, and present in this course can be found on the schedule page.

Prerequisites

We assume students have familiarity with basic machine learning. The prerequisites are:

  • (CS 320 or 400) and (Math 320, 340, 341, 345 or 375), or
  • CS 540, or
  • Graduate/Professional Standing

Grading

The grading for the course will be be based on (tentative, subject to change):

  • Homework Assignments (approximately 6 anticipated): 50%
  • Midterm Exam: 20%
  • Final Project: 30%

Project Policies

The projects will be done in groups of about 5-10 students. This will include proposals and check-ins with the instructor. More details will be presented during class.

The goal of the project is to identify a suitable problem in understanding, applying, or extending foundation models and to propose and validate an idea tackling this problem. Note: compute limitations are likely to be a factor; discuss plans with the instructor well in advance :)

General Homework Policies and Academic Misconduct

All homework assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the Academic Misconduct Guide for Students).