CS784: Formal Topics in Databases

Spring 2007

 

 

 

Instructor: Jeff Naughton

Office: CS4361

Office Hours: Tuesday 4:00 – 6:00.

Email: naughton@cs.wisc.edu

 

 

 

Lecture: 9:30 – 10:45 TR, 1257 Comp Sci

 

Overview: The official name of this course is “Data models and Languages.” Unfortunately that is not a good description of what I intend to cover this semester (it is a “legacy name” left over from the past.) What I do intend to cover is “interesting material that is not covered in 564 or 764, that is for the most part somewhat foundational in nature, and that is relevant to research and industrial development going on today in the broad context of data management.

 

Exams: There will be three exams in this course, spread roughly evenly throughout the semester. There is an optional assignment (more on this later.) If you do the assignment, each exam is worth 25% of the course grade and the project is the remaining 25%. If you don’t, each exam is worth 1/3 of the course grade.

 

First midterm: in class, Tuesday October 6. Here is a solution sketch for the first midterm.

Second midterm: in class, approximately Thursday, November 17 (this may change.)

Third exam (end term): in class, Tuesday, December 15.

Optional course projects due: Tuesday, December 22.

 

 

Readings: The following is a guess as to what we will cover, and the order in which we will cover it. This will almost certainly change as the semester progresses, but I will try to give you at least a week or so of warning before adding to or deleting from this list:

 

-          “On the universality of data retrieval languages,” Aho and Ullman.

-          Deductive databases (Datalog.) Ullman notes.

-          Evaluation of recursive programs. Bancilhon and Ramakrishnan

-          Conjunctive queries and their containment equivalences paper

-          (We will skip this one for now. tableau containment paper.)

-          Logic and data integration. logic and integration paper. queries and views. If you want a slightly more detailed description of the “bucket” and “inverse rules” algorithms from the “logic and integration” paper, read sections 6.1-3 of this paper.)

-          Schema matching.  schema matching survey

-          A schema matching approach to querying heterogeneous RDBMS (Guest lecture 3/8, paper for the lecture: Kite.)

-          (We will probably skip this one, but it is an easy read for folks who are interested in the current status as of a few years ago. retrospective on data integration)

-          Information retrieval overview. I/R overview

-          Privacy and DBMS: k-anonymity Mondrian.

-          Web search: page rank. pagerank.pdf

-          Web search: google overview

-          DB and Web Search. Brewer

-          Keyword search over RDBMS Discover paper

-          Data mining: association rules. associationRules.pdf

-          Data extraction: AnHai’s lecture. AnHai's Powerpoint. The astute student will note that this is 179 slides. You don’t have to read anything after slide 94. In addition, you can skip slides 23 – 32.

-          Data mining: clustering. Birch