CS 880: ALGORITHMS FOR MASSIVE DATASETS

    Engr 1213 MWF 11:00-12:15PM
    Fall 2017

 



      Piazza
      Calendar


       UW CS
       Theory Group




    Course description and content

    Historically, computational efficiency in algorithm design has been equated with polynomial running times. Increasingly algorithms are being applied in scenarios where even linear running times are too expensive, and the input data is so large as to not fit into the main memory of a single machine. In this course we will explore algorithmic ideas and techniques developed specifically for this "big data" regime. The course will cover the following main themes: streaming/sketching, dimension reduction, sublinear time algorithms such as property testing, parallel algorithms and the map-reduce paradigm, and applications to machine learning. Along the way we will study several algorithmic "greatest hits" such as PCA/SVD, locality sensitive hashing, regression, k-means, and compressed sensing. Some links are provided as references for course material at the bottom of this page.

    Course calendar

    This course meets on M/W/F but on average we will meet twice a week. Please subscribe to the course Google calendar to keep track of meeting times, as well as homework release and due dates. Use this link to subscribe in iCal format, or this to view the calendar in HTML format.

    Prerequisites

    Students are required to have taken an ungraduate level algorithms and an undergraduate level computational complexity course (e.g. CS 577 and CS 520), and are expected to have a thorough understanding of basic algorithm design techniques, randomness, and basic complexity theory. Graduate level theory courses, particularly good understanding of the design and analysis of randomized algorithms, are a plus. If you have not taken the aforementioned courses but believe you have the necessary background for this course, please contact the instructor for permission.

    Evaluation

    Course evaluation will be based on several homeworks, a project, and scribe notes. The homeworks are expected to be challenging. Collaboration will be allowed and encouraged, but solutions must be submitted individually. The project will involve self-studying one or more papers related to the course topic, preparing a survey report, and giving a short presentation towards the end of the semester. Suggested topics and instructions for preparing your report will be announced at the beginning of the semester. Each student is also required to do their fair share of scribing lecture notes.

    Instructor: Prof. Shuchi Chawla

    Office Hours: Fridays 1:00-3:00 pm in CS 4373, or by appointment.

    Online Forum: We will use Piazza for all course related activity -- discussions, announcements, sharing of resources, etc. Lecture notes and homework will also be posted there. Sign-up here.

    Textbook/References

    There are no textbooks for this course. For some topics we will use the textbook by Blum, Hopcroft, and Kannan, that is available here for free. For other topics, we will use lecture notes from other sources, and in some cases research papers. See below for links to similar courses elsewhere with lecture notes. We will most frequently use these sources as readings.

    Reading material in the form of lecture notes, surveys, and research papers will be posted over time on Piazza. Students are expected to scribe lectures; These scribe notes will be made available on Piazza on a timely basis.