CS 784: Advanced Topics in Database Management Systems

Spring 2009, Wed/Fri 11:00-12:15pm, room 1257 COMP S&ST


AnHai Doan, contact information available from my homepage. Office hours: Wed 1:15-2pm and Fri 1:15-2pm, and by appointment (pls send email, thanks).

Course Description
The official name of this course is "Data models and Languages", a legacy name left over from the past. For this semester, what I intend to cover is interesting material that is not covered in 564 or 764, and that is relevant to research and industrial development going on today in the broad context of data management.


Prerequisites: Undergraduate knowledge of relational databases is highly recommended. If not, you should be willing to do a "crash course" on the topic in the first few weeks. The recommended books for the crash course are: The Cow Book, or The Complete Book.

Course Format
The course meets twice a week to discuss research papers. You are required to read the specified paper before each lecture and attend the lectures. There will be a midterm, a final, and a project.

Midterm: March 13, in class at usual time/room,
Final: date to be decided, in class at usual time/room,
Other important dates: Feb 4: no class; March 18, 20: no class, spring break; last class is May 8.

Grade: midterm: 30%, final: 30%, project: 35%, participation in the class: 5%.

Course Schedule
The paper list is below (may be revised slightly as the course progresses). Each paper will be covered in 1-2 lectures.

Intro to the class (read Sections 1-2 of the Cimple paper); also read this paper and this paper

IR / Web Search
IR overview
Web search, Pagerank
Web search, Google
Web search and RDBMS

Data Languages
Note: this part will form the foundation for you to study information extraction and integration.
On the universality of data retrieval languages
Deductive databases (Datalog), Ullman notes
Evaluation of recursive programs

Information Extraction
Managing information extraction
You can find the SIGMOD tutorial, from which I created the above lecture here.
Datalog applied to information extraction
Wrapper induction for information extraction

In case you want to read more:

Data Integration
Several chapters from a textbook-in-progress on data integration (I will send out the book draft shortly): In case you want to read more: Others
wiki/Web 2.0/mass collaboration and also the CACM survey paper (will be emailed to the class on Sunday or early Monday)
Keyword search over multiple RDBMSs
Data mining: association rules not covered, no need to read
Data mining: clustering not covered, no need to read
Column store vs. row store and hardware trends, DeWitt's note not covered, no need to read
Scalable Semantic Web data management using vertical partitioning

By Fri Feb 27: each team pls send me an email listing the names and email addresses of team members. Each team is 1-2 persons.

By Wed Mar 4: each team pls send me an email briefly describing the project topic.