wwwCS 784: Advanced Topics in Database Management Systems
CS 784: Advanced Topics in Database Management Systems
Spring 2013, Mon/Wed 2:30-3:45pm, room 1257 COMP S&ST
- The first lecture of the class is supposed to be on Wed Jan 23. But
I can't do it due to an important faculty meeting. So the first
lecture will be held on Friday Jan 25, 2:30-3:45pm instead.
- Welcome. Make sure you are on the class mailing list:
firstname.lastname@example.org. You should have been added to
the list automatically (if you are registered for the class).
- While the class normally meets just on Mondays and
Wednesdays, please reserve the 2:30-3:45pm slots on
Fridays. I will use these slots for additional and make-up lectures.
AnHai Doan, contact
information available from my homepage. Office hours: Mon/Wed
4-5pm (right after lectures) and by appointment (pls send email,
The official name of this course is
"Data models and Languages", a legacy name left over from the
past. What this course will cover is fundamental and hot data
management issues beyond relational data management. The goals are to
help students prepare for the database qualifying exam, and get
exposed to current hot and interesting trends beyond-relational
data management. Another way to view this is:
Prerequisites: Undergraduate knowledge of relational
databases is highly recommended. If not, you should be willing to do a
"crash course" on the topic in the first few weeks. The recommended books
for the crash course are:
The Cow Book, or
The Complete Book.
- CS 564 is "everything you should know so that you can get an industrial
job working with relational databases",
- CS 764 is "all the gory details you may (or
may not) want to know about relational data management systems", and
- CS 784 is "all
the stuff beyond relational data (e.g., Web, text, data mining, data integration, data extraction) that you should know to broaden your data management
knowledge or to work in the field as an advanced developer/researcher".
course meets twice a week to discuss research papers. You are
required to read the specified paper/textbook chapter/slides before each lecture and attend
the lectures. There will be a midterm, a final, and a project.
Midterm: Wed Mar 20, in class at usual time/room,
Final: TBD, in class at usual time/room,
Other important dates: Mar 23-31: spring break; last class: Fri May 10.
Grade: Midterm: 30%, final: 30%, project: 30%,
participation in the class: 10%.
Course schedule and the paper list are below (may be revised slightly
as the course progresses). Each paper will be covered in 1-2
Several chapters from a data integration
textbook (available on Amazon; I will send out the chapters that you have to read shortly). Slides for these chapters are available from the book's Web site.
IR / Web Search
- Overview, big picture, key issues (Chapter 1)
- Creating semantic mappings (Chapter 5, PPT slides presented in the class): Read 5.1 to 5.5, scan 5.6, read 5.7 to 5.9, scan 5.10.
- String matching, entity resolution (Chapters 4 and 7): Read 4.1,
4.2.1 (only "Edit Distance"), 4.2.2 (only "Overlap", "Jaccard", and
"TF/IDF"), 4.2.4, 4.3 (only "Inverted Index" and "Size Filtering").
Read 7.1, 7.2, 7.3, 7.4, 7.5.1, 7.5.2, 7.5.3, 7.6 (only the preamble
- Wrapper construction (Chapter 9): Read 9.1, 9.2, 9.3.1, 9.4, 9.5.2.
Read Chapter 27 (IR and XML Data) of the Cow Book, but only from
27.1 to 27.5.
Web search, Pagerank
Read Chapter 24 (Deductive Databases) of the Cow Book.
Deductive databases (Datalog), Ullman notes
Evaluation of recursive programs (scan it only)
Data Mining (tentative, awaiting syncing with CS 764)
Read Chapter 26 (Data Mining) of the Cow Book.
Data mining: association rules
Data mining: clustering
Colliding Worlds: Hot Emerging Topics
We will cover several hot emerging topics, such as big data, noSQL,
crowdsourcing, information extraction, and social media analysis.
- Big Data
Details to be added later.
- Information Extraction
Managing information extraction
You can find the SIGMOD tutorial, from which I created the above
In case you want to read more:
- Social Media Analysis
Slides will be mailed out later.
Details to be posted later