CS 784: Advanced Topics in Database Management Systems

Spring 2010, Tue/Thur 1:00-2:15pm, room 1325 COMP S&ST Cowzone


AnHai Doan, contact information available from my homepage. Office hours: Tue/Thur 2:15-3:15pm and by appointment (pls send email, thanks).

Course Description
The official name of this course is "Data models and Languages", a legacy name left over from the past. For this semester, what I intend to cover is interesting material that is not covered in 564 or 764, and that is relevant to research and industrial development going on today in the broad context of data management.


Prerequisites: Undergraduate knowledge of relational databases is highly recommended. If not, you should be willing to do a "crash course" on the topic in the first few weeks. The recommended books for the crash course are: The Cow Book, or The Complete Book.

Course Format
The course meets twice a week to discuss research papers. You are required to read the specified paper before each lecture and attend the lectures. There will be a midterm, a final, and an optional project.

Midterm: Mar 18, in class at usual time/room,
Final: May 6, in class at usual time/room,
Other important dates: Mar 27 until Apr 4: no class, spring break; last class is May 6.

Grade: If you do the project, then midterm: 30%, final: 30%, project: 30%, participation in the class: 10%. Otherwise, midterm: 45%, final: 45%, participation in the class: 10%.

Course Schedule
Course schedule and the paper list is below (may be revised slightly as the course progresses). Each paper will be covered in 1-2 lectures.

Read Chapter 24 (Deductive Databases) of the Cow Book.
Deductive databases (Datalog), Ullman notes
Evaluation of recursive programs (scan it only)

Data Integration
Several chapters from a textbook-in-progress on data integration (I will send out the book draft shortly):

In case you want to read more: IR / Web Search / Large-scale data analysis
Read Chapter 27 (IR and XML Data) of the Cow Book, but only from 27.1 to 27.5.
IR overview
Web search, Pagerank
MapReduce: simplified data processing on large clusters

Information Extraction
Managing information extraction
You can find the SIGMOD tutorial, from which I created the above lecture here.
Datalog applied to information extraction
Wrapper induction for information extraction

In case you want to read more:

Data Warehousing, OLAP
Read Chapter 25 (Data Warehousing and Decision Support) of the Cow Book.
An overview of data warehousing and OLAP technology
Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals

Data Mining
Read Chapter 26 (Data Mining) of the Cow Book.
Data mining: association rules
Data mining: clustering

Colliding Worlds
MapReduce and parallel DBMSs: friends or foes?
MapReduce: a flexible data processing tool

Building community wikipedias: a human-machine approach
Mass collaboration systems on the World-Wide Web

Keyword search over multiple RDBMSs

Scalable Semantic Web data management using vertical partitioning

Details to be posted later