CS 839 Fall 2024: Foundation Models

Logistics

Time: TR 1:00 PM - 2:15 PM
Location: Ingraham 22
Instructor: Frederic Sala
TA: Sonia Cromp
Piazza: https://piazza.com/wisc/fall2024/d89e

Instructor Email: fredsala@cs.wisc.edu

Instructor Office Hours: Thursdays, 3:00-4:30 PM

Instructor Office Location: CS 5385

TA Email: cromp@wisc.edu

Instructor Office Hours: Fridays, 1:00-2:00 PM

Instructor Office Location: CS 3205

Piazza Webpage (for discussion and notification): https://piazza.com/class/m0mktyotdhl2zw

Homework Submission: Canvas

Note: for email, please put [CS839] in the subject title. Thanks!

Course Description

Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Stable Diffusion have astonishing abilities to answer questions, speak with users, and generate sophisticated art---all without any additional training. This course covers all aspects of these fascinating models. We will learn how such models are built, including data acquisition, selecting model architectures, and pretraining approaches. Next, a significant focus is how to use and deploy foundation models, including prompting strategies, providing in-context examples, fine-tuning, integrating into existing data science pipelines, and more. We discuss recent advances that improve foundation models, such as large-scale human feedback. Finally, we cover the potential societal impacts of these models. Familiarity with basic machine learning is assumed.

This course will have two parts. In the first part, we will cover

Basic building blocks: transformers, attention, new subquadratic architectures
Popular large pretrained models: encoder-only, encoder-decoder, decoder-only, along with models for other data modalities (and multimodal)
Using foundation models: prompting, in-context learning, fine-tuning and specialization
Training and inference efficiency, selecting data, and scaling
Analysis: evaluation, security, privacy, societal impacts

In the second part, we use our new skills to read and understand cutting-edge papers in this field. Students will break intro groups and present papers, focusing on understanding and explaining the key results, and comparing them with other works.

A sampling of the papers we will read, understand, and present in this course can be found on the schedule page.

Prerequisites

Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.

Grading

The grading for the course will be be based on (tentative, subject to change):

Homework Assignments (3 anticipated): 30%
Class Presentations: 30%
Final Project: 40%

Project And Presentation Policies

The presentations and projects will be done in groups of about 3-6 students. Both will include proposals and check-ins with the instructor. More details will be presented during class.

The goal of the presentation is for students to read and present a cutting-edge paper on foundation models/large language models/large pretrained models. Very likely these will be papers published during the timeframe of the class itself---this is a fast-moving field! Students will summarize and discuss the key advancements in the paper, its connection to the literature covered in class, and how the ideas are validated.

The goal of the project is to identify a suitable problem in understanding, applying, or extending foundation models and to propose and validate an idea tackling this problem. In the ideal case, the resulting project will be a starting point for a submission at a top-tier machine learning conference. Note: compute limitations are likely to be a factor; discuss plans with the instructor well in advance :)

General Homework Policies and Academic Misconduct

All homework assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the Academic Misconduct Guide for Students).