CS 839, Fall 2023
Department of Computer Sciences
University of Wisconsin–Madison
Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Stable Diffusion have astonishing abilities to answer questions, speak with users, and generate sophisticated art---all without any additional training. This course covers all aspects of these fascinating models. We will learn how such models are built, including data acquisition, selecting model architectures, and pretraining approaches. Next, a significant focus is how to use and deploy foundation models, including prompting strategies, providing in-context examples, fine-tuning, integrating into existing data science pipelines, and more. We discuss recent advances that improve foundation models, such as large-scale human feedback. Finally, we cover the potential societal impacts of these models. Familiarity with basic machine learning is assumed.
This course will have two parts. In the first part, we will cover
A sampling of the papers we will read, understand, and present in this course can be found on the schedule page.
Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.
The grading for the course will be be based on (tentative, subject to change):
The presentations and projects will be done in groups of about 3-6 students. Both will include proposals and check-ins with the instructor. More details will be presented during class.
The goal of the presentation is for students to read and present a cutting-edge paper on foundation models/large language models/large pretrained models. Very likely these will be papers published during the timeframe of the class itself---this is a fast-moving field! Students will summarize and discuss the key advancements in the paper, its connection to the literature covered in class, and how the ideas are validated.
The goal of the project is to identify a suitable problem in understanding, applying, or extending foundation models and to propose and validate an idea tackling this problem. In the ideal case, the resulting project will be a starting point for a submission at a top-tier machine learning conference. Note: compute limitations are likely to be a factor; discuss plans with the instructor well in advance :)
All homework assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the Academic Misconduct Guide for Students).