CS 784: Advanced Topics in Database Management Systems

Spring 2008, Tue/Thur 2:30-3:45pm, room 1289


AnHai Doan, contact information available from my homepage. Office hours: Tue/Thur: 3:45-4:30pm and by appointment (pls send email, thanks).

Course Description
The official name of this course is "Data models and Languages", a legacy name left over from the past. For this semester, what I intend to cover is interesting material that is not covered in 564 or 764, and that is relevant to research and industrial development going on today in the broad context of data management.

Prerequisites: Undergraduate knowledge of relational databases is highly recommended. If not, you should be willing to do a "crash course" on the topic in the first few weeks. The recommended books for the crash course are: The Cow Book, or The Complete Book.

Course Format
The course meets twice a week to discuss research papers. You are required to read the specified paper before each lecture and attend the lectures. There will be three exams, spread roughly evenly throughout the semester, and a project.

First exam: Tue Feb 26, in class at usual time/room,
Second exam: Tue Apr 8, in class at usual time/room,
Third exam: Tue May 13, in class at usual time/room.
Other important dates: Jan 29: no class; March 18, 20: no class, spring break; last class is May 8.

Grade: the three exams and the project will each be worth 25% of the grade.

Course Schedule
The paper list is below (may be revised slightly as the course progresses). Each paper will be covered in 1-2 lectures.

Intro to the class (read Sections 1-2 of the Cimple paper)
On the universality of data retrieval languages
Deductive databases (Datalog), Ullman notes
Evaluation of recursive programs
Managing information extraction
You can find the SIGMOD tutorial, from which I created the above lecture here.
Datalog applied to information extraction
Data integration: Several chapters from a textbook-in-progress on data integration (you must have received this by now):

wiki/Web 2.0/mass collaboration
IR overview
Web search, Pagerank
Web search, Google
Web search and RDBMS
Keyword search over multiple RDBMSs
Data mining: association rules
Data mining: clustering
Column store vs. row store and hardware trends, DeWitt's note
Scalable Semantic Web data management using vertical partitioning

The project submission deadline is Monday May 19, by 11am. Please email me a pdf copy of your project report AND slide a hard copy of the project report under my office door. The pdf copy is for record keeping, and the hard copy is for grading.

As discussed in the class, your project report is not required to adhere to any fixed format. If you still have any question about this, please let me know.