CS 784 Advanced Topics in Database Management Systems: Data Science

Spring 2016, Wed/Fri 1-2:15pm, Room 1325 CS Bldg


AnHai Doan, contact information available from my homepage. Office hours: Fri 4-5pm and by appointment (pls send email, thanks).

Course Description
The official name of this course is "Data models and Languages", a legacy name left over from the past (which hopefully we will be able to change soon). This semester the course is an introduction to data science. You can get an idea for what will be covered by looking at the course syllabus below.

Prerequisites: Undergraduate knowledge of relational databases is highly recommended. If not, you should be willing to do a "crash course" on the topic in the first few weeks. The recommended books for the crash course are: The Cow Book, or The Complete Book.

Some knowledge of machine learning (especially supervised learning) is helpful. If you haven't had any exposure to machine learning before, you can read up on the topic in the first few weeks of the class. We will also cover the most basic stuff of supervised learning in the first few lectures.

Knowing Python is helpful for the class project. If you don't know it, use this as an excuse to soak it up this semester. It is a relatively easy language to learn and start using. We will discuss prerequisites more in the class.

Course Format
The course meets twice a week to discuss research papers. You are required to read the specified paper/textbook chapter/slides before each lecture and attend the lectures. There will be a midterm, a final, and a project.

Midterm: Fri Mar 18, in class at usual time/room,
Final: Mon May 9, in class at usual time/room,
Other important dates: first class: Wed Jan 20, Spring break: Mar 19 - Mar 27, last class: Fri May 6.

Grade: Midterm: 30%, final: 30%, project: 40%.

Lecture Slides
Slides presented in the class, in chronological order.

Course Schedule
Course schedule and the paper list are below (may be revised slightly as the course progresses). Each paper will be covered in 1-2 lectures. Some topics below refer to chapters in a textbook. I will email the scanned copies of these chapters. Slides for the chapters are available on the book's website.

The Big Picture and Preliminaries

Data Acquisition and Pre-processing Data Integration
Data Exploration and Analysis Other Issues Beyond Data Science Misc Reading

Potentially interesting stuff. I haven't read these carefully.

Students will form 2-person teams for a multi-stage project that addresses a data science problem. Will discuss in the class.