CS 784: Advanced Topics in Database Management Systems
Spring 2010, Tue/Thur 1:00-2:15pm, room 1325 COMP S&ST Cowzone
contact information available from my homepage. Office hours: Tue/Thur 2:15-3:15pm
and by appointment (pls send email, thanks).
The official name of this course is
"Data models and Languages", a legacy name left over from the
past. For this semester, what I intend to cover is interesting
material that is not covered in 564 or 764, and that is relevant to
research and industrial development going on today in the broad
context of data management.
Prerequisites: Undergraduate knowledge of relational
databases is highly recommended. If not, you should be willing to do a
"crash course" on the topic in the first few weeks. The recommended books
for the crash course are:
The Cow Book, or
The Complete Book.
- CS 564 is "everything you should know so that you can get an industrial
job working with relational databases",
- CS 764 is "all the gory details you may (or
may not) want to know about relational data management systems", and
- CS 784 is "all
the stuff beyond relational data (e.g., Web, text, data mining, data integration, data extraction) that you should know to broaden your data management
knowledge or to work in the field as an advanced developer/researcher".
course meets twice a week to discuss research papers. You are
required to read the specified paper before each lecture and attend
the lectures. There will be a midterm, a final, and
an optional project.
Midterm: Mar 18, in class at usual time/room,
Final: May 6, in class at usual time/room,
Other important dates: Mar 27 until Apr 4:
no class, spring break; last class is May 6.
Grade: If you do the project, then midterm: 30%,
final: 30%, project: 30%, participation in the class: 10%. Otherwise,
midterm: 45%, final: 45%, participation in the class: 10%.
Course schedule and the paper list is below (may be revised slightly
as the course progresses). Each paper will be covered in 1-2
Read Chapter 24 (Deductive Databases) of the Cow Book.
Deductive databases (Datalog), Ullman notes
Evaluation of recursive programs (scan it only)
Several chapters from a
textbook-in-progress on data integration (I will send out the book
In case you want to read more:
IR / Web Search / Large-scale data analysis
- Overview, virtual integration (Chapters 1-2)
- Query unfolding, query containment, and answering queries using
views (Chapter 3. Read only 3.1, 3.2.1, 3.2.2 (skim 3.2.2 only), 3.3.1, 3.3.2,
3.3.3, 3.3.4 (read only the bucket algorithm in 3.3.4))
- Describing data sources
(Chapter 4. Read only 4.1, 4.2)
- Creating semantic mappings (Chapter 5. Read only 5.1, 5.2, 5.3,
5.4, and the preamble of 5.5. Then read 5.6, and the first part of 5.9
(up to right before the headline "Searching a set of possible schema
mapping")) PPT slides
presented in the class
- Data mapping (no reading, materials covered in the class)
- Other integration approaches (Read
this two-page statement).
Read Chapter 27 (IR and XML Data) of the Cow Book, but only from
27.1 to 27.5.
Web search, Pagerank
MapReduce: simplified data processing on large clusters
Managing information extraction
You can find the SIGMOD tutorial, from which I created the above
Datalog applied to information extraction
Wrapper induction for information extraction
In case you want to read more:
Data Warehousing, OLAP
Read Chapter 25 (Data Warehousing and Decision Support) of the Cow Book.
An overview of data warehousing and OLAP technology
Data cube: a relational aggregation operator generalizing
group-by, cross-tab, and sub-totals
Read Chapter 26 (Data Mining) of the Cow Book.
Data mining: association rules
Data mining: clustering
MapReduce and parallel DBMSs: friends or foes?
MapReduce: a flexible data processing tool
community wikipedias: a human-machine approach
systems on the World-Wide Web
Keyword search over multiple RDBMSs
Scalable Semantic Web data management using vertical partitioning
Details to be posted later