CS784: Formal Topics in Databases

Fall 2009

 

 

 

Instructor: Jeff Naughton

Office: CS4361

Office Hours: Tuesday 4:30 – 6:00.

Email: naughton@cs.wisc.edu

 

Class mailing list: compsci784-1-f09@lists.wisc.edu

 

 

 

 

Lecture: 9:30 – 10:45 TR, 1257 Comp Sci

 

Overview: The official name of this course is “Data models and Languages.” Unfortunately that is not a good description of what I intend to cover this semester (it is a “legacy name” left over from the past.) What I do intend to cover is “interesting material that is not covered in 564 or 764, that is for the most part somewhat foundational in nature, and that is relevant to research and industrial development going on today in the broad context of data management.

 

Exams: There will be three exams in this course, spread roughly evenly throughout the semester. There is an optional assignment (more on this later.) If you do the assignment, each exam is worth 25% of the course grade and the project is the remaining 25%. If you don’t, each exam is worth 1/3 of the course grade.

 

First midterm: in class, Tuesday October 6. Here is solution sketch for the first midterm this year.

Second midterm: in class, Thursday, November 19. Here is an old exam.

Third exam (end term): in class, Tuesday, December 15.

Optional course projects due: Tuesday, December 22.

 

 

Readings: The following is a guess as to what we will cover, and the order in which we will cover it. This will almost certainly change as the semester progresses, but I will try to give you at least a week or so of warning before adding to or deleting from this list:

 

-          “On the universality of data retrieval languages,” Aho and Ullman.

-          Deductive databases (Datalog.) Ullman notes.

-          Evaluation of recursive programs. Bancilhon and Ramakrishnan

-          (We will skip these two for now; Conjunctive queries and their containment equivalences paper; tableau containment paper. Instead, read the email I sent the class on containment for Datalog rules.)

-          Logic and data integration. logic and integration paper. If you want a slightly more detailed description of the “bucket” algorithm from the “logic and integration” paper, read sections 6.1 and 6.2 of this paper: queries and views)

Midterm 1 covers up through here.

 

-          Schema matching.  schema matching survey

-           (We will probably skip this one, but it is an easy read for folks who are interested in the current status as of a few years ago. retrospective on data integration)

-          Information retrieval overview. I/R overview

-          Web search: page rank. pagerank.pdf

-          DB and Web Search. Brewer

-          Web search: google overview

-          Oct. 20, Declarative Querying for Biological Sequence Databases. Lecture by Jignesh Patel.

-          Oct. 22, Keyword search over RDBMS Discover paper. Lecture by AnHai Doan.

-          Privacy and DBMS: k-anonymity Mondrian.

-          Oct. 29, MapReduce vs. DBMS.

-          Data mining: association rules. associationRules.pdf

-          Data mining: clustering. Birch

Midterm 2 covers through here

-          Stream data management. stanford stream overview

-          Data extraction: AnHai’s lecture. AnHai's Powerpoint. The astute student will note that this is 179 slides. You don’t have to read anything after slide 94. In addition, you can skip slides 23 – 32.