Schedule

This is a tentative schedule that will be updated throughout the course.

Lecture Date Topic References Notes
1 Tu 1/17 Introduction introduction
2 Th 1/19 Conjunctive Queries Alice Book: Ch. 3, 4 lecture1
3 Tu 1/24 Beyond Conjunctive Queries Alice Book: Ch. 3, 4 lecture1
4 Th 1/26 Query Containment Alice Book: Ch. 6.2 lecture2
5 Tu 1/31 Query Containment Alice Book: Ch. 6.2 lecture2
6 Th 2/2 Intro to Query Complexity lecture3
7 Tu 2/7 Acyclic Joins Alice Book: Ch. 6.4 lecture4
8 Th 2/9 Acyclic Joins lecture4
9 Tu 2/14 Beyond Acyclic Queries lecture5
10 Th 2/16 Beyond Acyclic Queries lecture5
11 Tu 2/21 Size Bounds for Joins lecture6
12 Th 2/23 Datalog: semantics Alice Book: Ch. 12 lecture7
13 Tu 2/28 Datalog: bottom-up evaluation Alice Book: Ch. 13.1 lecture8
14 Th 3/2 Datalog: top-down evaluation Alice Book: Ch. 13.2 lecture8
15 Tu 3/7 Datalog: magic sets Alice Book: Ch. 13.3 lecture8
16 Th 3/9 Datalog: negation Alice Book: Ch. 15.1-15.3 lecture9
17 Tu 3/14 Views and Rewriting Paper Review
18 Th 3/16 Parallel Query Processing lecture10
19 Tu 3/28 Parallel Query Processing lecture10
Th 3/30 NO CLASS
20 Tu 4/4 Data Streaming Paper Review notes (Muthukrishnan)
21 Th 4/6 Data Streaming notes (Chakrabati)
22 Tu 4/11 Probabilistic Databases Paper Review
23 Th 4/13 Probabilistic Databases
24 Tu 4/18 Consistent Query Answering CQA for primary keys
25 Th 4/20 Provenance: Why Paper Review Chapter 2
26 Tu 4/25 Provenance: How Chapter 3
27 Th 4/27 Differential Privacy Paper Review Differential Privacy
Tu 5/2 Project Presentation
Th 5/4 Project Presentation

Paper Reviews

  • Thursday 3/14: Answering queries using views: A survey. Focus on sections 1,2,3

    • Give a brief summary of the problem.

    • What is the difference between certain answers and query rewriting? Why would someone use one or the other technique?

  • Tuesday 4/4: Data Streams: Algorithms and Applications. Sections 1-6

    • How do data stream systems differ from traditional relational databases?

    • What are some applications of data streaming?

    • Describe some data stream models, and discuss a scenarion where they may be applicable.

  • Tuesday 4/11: Probabilistic Databases: Diamonds in the Dirt

    • Describe briefly what is a probabilistic database.

    • What are some key applications of probabilistic databases?

    • Why do you think that probabilistic databases have not been widely adopted yet?

  • Thursday 4/20: Provenance in Databases: Why, How, and Where (read only the introduction)

    • What are some applications of provenance?

    • What is the difference between why, how and where provenance?

    • Discuss the differences between eager and lazy provenance computation.

Reading Material


During the first lectures, some of the material will be from the Alice Book:

  • Foundations of Databases, Abiteboul, Hull, Vianu (book).

Some of the papers that we will study throughout the course:

Query Complexity

  • Optimal implementation of conjunctive queries in relational databases, Chandra, Merlin, STOC 1977 (paper)

  • The Complexity of Relational Query Languages, Vardi, STOC 1982 (paper)

  • Algorithms for acyclic database schemes, Yannakakis, VLDB 1981.

  • Size bounds and query plans for relational joins, Atserias, Grohe, Marx, FOCS 2008 (paper)

  • Hypertree Decompositions and Tractable Queries, Gottlob, Leone, Scarcello, JCSS 2002 (paper)

  • Leapfrog Triejoin: a worst-case optimal join algorithm, Veldhuizen, ICDT 2014 (paper)

  • Skew Strikes Back: New Developments in the Theory of Join Algorithms, Ngo, Re, Rudra, SIGMOD RECORD 2013 (paper).

Datalog

  • What You Always Wanted to Know About Datalog(And Never Dared to Ask), Ceri, Gottlob, Tanca, TKDE 1989 (paper)

Parallel Query Processing

  • MapReduce: simplified data processing on large clusters, Dean, Ghemawat, OSDI 2004 (paper)

  • MapReduce and parallel DBMSs: friends or foes?, Stonebraker et al., CACM 2010 (paper)

  • Optimizing Joins in a Map-Reduce Environment, Afrati, Ullman, EDBT 2010 (paper)

  • A Guide to Formal Analysis of Join Processing in Massively Parallel Systems, Koutris, Suciu, SIGMOD Record 2016 (paper)

Data Streaming

  • Models and issues in data stream systems, Babcock, Babu, Datar, Motwani, Widom, PODS 2002 (paper).

  • The space complexity of approximating the frequency moments, Alon, Matias, Szegedy, STOC 1996 (paper).

Uncertain Data

  • Probabilistic Databases, Suciu, Olteanu, Re, Koch (book)

  • Probabilistic Databases: Diamonds in the Dirt, Dalvi, Re, Suciu, CACM 2008 (paper)

  • The dichotomy of probabilistic inference for unions of conjunctive queries, Dalvi, Suciu, JACM 2012 (paper)

  • Consistent Query Answering: Five Easy Pieces, Chomicki, ICDT 2007 (paper)

  • Consistent Query Answers in Inconsistent Databases, Arenas, Bertossi, Chomicki, PODS 1999 (paper)

Provenance

  • Provenance Semirings, Green, Karvounarakis, Tannen, PODS 2007 (paper)

  • Provenance in Databases: Why, How and Where, Cheney, Chiticariu, Tan, Foundations and Trends in Databases 2009 (paper)

  • On Propagation of Deletions and Annotations Through Views, Buneman, Khanna, Tan, PODS 2002 (paper)

  • Maximizing Conjunctive Views in Deletion Propagation, Kimefeld, Vondrak, Williams, TODS 2012 (paper)

Privacy

  • On the Complexity of Optimal K-Anonymity, Meyerson, Williams, PODS 2004 (paper)

  • Revealing Information while Preserving Privacy, Dinur, Nissim, PODS 2003 (paper)

  • A Firm Foundation for Private Data Analysis, Dwork, CACM 2011 (paper)