CS 784: Advanced Topics in Database Management Systems

Fall 2015, Wed/Fri 2:30-3:45pm, room 113 Psychology Bldg


AnHai Doan, contact information available from my homepage. Office hours: Fri 4-5pm and by appointment (pls send email, thanks).

Course Description
The official name of this course is "Data models and Languages", a legacy name left over from the past. What this course will cover is fundamental and hot data management issues beyond relational data management. The goals are to help students prepare for the database qualifying exam, and get exposed to current hot and interesting trends beyond-relational data management. Another way to view this is: Prerequisites: Undergraduate knowledge of relational databases is highly recommended. If not, you should be willing to do a "crash course" on the topic in the first few weeks. The recommended books for the crash course are: The Cow Book, or The Complete Book.

Some knowledge of machine learning (especially supervised learning) is helpful. If you haven't had any exposure to machine learning before, you can read up on the topic in the first few weeks of the class. We will also cover the most basic stuff of supervised learning in the first few lectures.

Knowing Python is helpful for the class project. If you don't know it, use this as an excuse to soak it up this semester. It is a relatively easy language to learn and start using. We will discuss prerequisites more in the class.

Course Format
The course meets twice a week to discuss research papers. You are required to read the specified paper/textbook chapter/slides before each lecture and attend the lectures. There will be a midterm, a final, and a project.

Midterm: TBD, in class at usual time/room,
Final: TBD, in class at usual time/room,
Other important dates: first class: Wed Sept 2, Thanksgiving break: Nov 26 - Nov 29, last class: Fri Dec 11.

Grade: Midterm: 30%, final: 30%, project: 40%.

Course Schedule
Course schedule and the paper list are below (may be revised slightly as the course progresses). Each paper will be covered in 1-2 lectures. Some topics below refer to chapters in a textbook. I will email the scanned copies of these chapters. Slides for the chapters are available on the book's website.

The Big Picture and Preliminaries

Data Acquisition and Pre-processing Data Integration
Data Exploration and Analysis Other Issues Beyond Data Science Misc Reading

Potentially interesting stuff. I haven't read these carefully.

Students will form 2-person teams for a multi-stage project that addresses a data science problem. Will discuss in the class.