Foundation Models

CS 839, Fall 2024
Department of Computer Sciences
University of Wisconsin–Madison


Logistics

  • Instructor Email: fredsala@cs.wisc.edu
  • Instructor Office Hours: Thursdays, 3:00-4:30 PM
  • Instructor Office Location: CS 5385

  • TA Email: cromp@wisc.edu
  • Instructor Office Hours: Fridays, 1:00-2:00 PM
  • Instructor Office Location: CS 3205

  • Piazza Webpage (for discussion and notification): https://piazza.com/class/m0mktyotdhl2zw
  • Homework Submission: Canvas
  • Note: for email, please put [CS839] in the subject title. Thanks!

    Course Description

    Description: Large pretrained machine learning models, also known as foundation models, have taken the world by storm. Models like ChatGPT, Claude, and Stable Diffusion have astonishing abilities to answer questions, speak with users, and generate sophisticated art---all without any additional training. This course covers all aspects of these fascinating models. We will learn how such models are built, including data acquisition, selecting model architectures, and pretraining approaches. Next, a significant focus is how to use and deploy foundation models, including prompting strategies, providing in-context examples, fine-tuning, integrating into existing data science pipelines, and more. We discuss recent advances that improve foundation models, such as large-scale human feedback. Finally, we cover the potential societal impacts of these models. Familiarity with basic machine learning is assumed.

    This course will have two parts. In the first part, we will cover

    • Basic building blocks: transformers, attention, new subquadratic architectures
    • Popular large pretrained models: encoder-only, encoder-decoder, decoder-only, along with models for other data modalities (and multimodal)
    • Using foundation models: prompting, in-context learning, fine-tuning and specialization
    • Training and inference efficiency, selecting data, and scaling
    • Analysis: evaluation, security, privacy, societal impacts
    In the second part, we use our new skills to read and understand cutting-edge papers in this field. Students will break intro groups and present papers, focusing on understanding and explaining the key results, and comparing them with other works.

    A sampling of the papers we will read, understand, and present in this course can be found on the schedule page.

    Prerequisites

    Familiarity with basic machine learning is assumed. We will review advanced material as needed, but experience and maturity is recommended.

    Grading

    The grading for the course will be be based on (tentative, subject to change):

    • Homework Assignments (3 anticipated): 30%
    • Class Presentations: 30%
    • Final Project: 40%

    Project And Presentation Policies

    The presentations and projects will be done in groups of about 3-6 students. Both will include proposals and check-ins with the instructor. More details will be presented during class.

    The goal of the presentation is for students to read and present a cutting-edge paper on foundation models/large language models/large pretrained models. Very likely these will be papers published during the timeframe of the class itself---this is a fast-moving field! Students will summarize and discuss the key advancements in the paper, its connection to the literature covered in class, and how the ideas are validated.

    The goal of the project is to identify a suitable problem in understanding, applying, or extending foundation models and to propose and validate an idea tackling this problem. In the ideal case, the resulting project will be a starting point for a submission at a top-tier machine learning conference. Note: compute limitations are likely to be a factor; discuss plans with the instructor well in advance :)

    General Homework Policies and Academic Misconduct

    All homework assignments must be done individually. Cheating and plagiarism will be dealt with in accordance with University procedures (see the Academic Misconduct Guide for Students).