Christopher Ré
Email: chrisre at cs.wisc.edu
Phone: (608) 263-5489
Office 4363
Office Hours: By Appointment
Department of Computer Sciences
University of Wisconsin-Madison
1210 W. Dayton St.
Madison, WI 53706-1685
  • About

  • News!

  • Students

  • Papers & Talks

  • Projects

  • Support

  • Courses

  • Bio

  • Hazy

  • SIGMOD EVENTS New Researcher Symposium and Undergraduate Research Poster Competition. Please come to these two events at this week's SIGMOD to encourage the next generation of database nerds!
    • PODS 2012 Hung Ngo, Ely Porat, Atri Rudra, and I have a new PODS paper called "Worst-Case Optimal Join Algorithms." For all join queries, we are able to provide the first optimal algorithm in terms of data complexity. The key to the result is an algorithmic proof for a pair of inequalities from discrete geometry called the Loomis-Whitney inequality from the 1940s and the more recent Bollobás-Thomason inequality. We show that these classical inequalties are equivalent to the beautiful recent results by Atserias, Grohe, and Marx. Our work also provides an algorithmic proof of these results. In contrast, all previous proofs are non-constructive. Thank you to the program committee for selecting this paper for the Best Paper Award.
    • SIGMOD 2012 Aaron Feng, Arun Kumar, Ben Recht, and I have a paper Towards a Unified Architecture for In-RDBMS Analytics accepted to SIGMOD 2012. The code is available here Bismark (so is the data) and the paper (full version). Thank you to Greenplum and Oracle for valuable discussions that helped us understand this problem!

  • ACL 2012 Feng Niu, Ce Zhang, Jude Shavlik, and I have a paper that builds on our experiences with DeepDive to study the impact of Big Data with Distant Supervision versus Crowd-Sourced data on the quality of relationship extraction from the Web. The paper title is Big Data versus the Crowd: Looking for Relationships in All the Right Places.
  • COLT 2012 Ben Recht and I have a new manuscript. The starting point of the manuscript is that practioners who run sequential random sampling algorithms (like stochastic gradient) sample from their data without replacement. In many cases, people have observed that sampling without replacement empircally converges to an optimal value faster than with-replacement sampling approaches. There is, however, a gap in our current theory as current theory provides faster convergence rates for with-replacement sampling than without replacement sampling. This paper takes a step toward closing this gap. We establish that without replacement methods do converge faster in several sampling models and propose a general inequality, a symmetrized (noncommutative) arithmetic-geometric-mean inequality, that would close this gap in many cases. CRC Coming Soon.
  • VLDB 2012. Accepted for publication in VLDB 2012 in Istanbul.
    • Arun Kumar and I have a paper, Probabilistic Management of OCR Data using an RDBMS. The paper is our first attempt to understand the implications of combining rich content models like OCR with the sophisticated querying capabilities of an RDBMS. The state-of-the-art models underlying these content models are probabilistic, e.g., OCRopus for Google Books. These models have very high quality, but the sheer size of these models can destroy query processing performance. Arun's idea is to use some ideas from statistics to compress the model; in turn, this allows him to trade run time for quality. Thank you to the Microsoft Jim Gray Systems Lab for supporting Arun's work. (Code and data here)
    • The MADlib Analytics Library or MAD Skills, the SQL. Joseph M. Hellerstein, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar, and I have a paper about the status of the status of MADLib, which is an awesome open-source effort led by Greenplum. Thank you, EMC!

I am an assistant professor in the department of Computer Sciences at the University of Wisconsin-Madison. My interests are theoretical and practical problems in data management. Details of my work can be found in my papers and my project website, Hazy. I believe that the future of computing is in data management. If you agree, are an outstanding student, and are beginning graduate work, then please send me an email.


  • SIGMOD EVENTS New Researcher Symposium and Undergraduate Research Poster Competition. Please come to these two events at this week's SIGMOD to encourage the next generation of database nerds!
    • PODS 2012 Hung Ngo, Ely Porat, Atri Rudra, and I have a new PODS paper called "Worst-Case Optimal Join Algorithms." For all join queries, we are able to provide the first optimal algorithm in terms of data complexity. The key to the result is an algorithmic proof for a pair of inequalities from discrete geometry called the Loomis-Whitney inequality from the 1940s and the more recent Bollobás-Thomason inequality. We show that these classical inequalties are equivalent to the beautiful recent results by Atserias, Grohe, and Marx. Our work also provides an algorithmic proof of these results. In contrast, all previous proofs are non-constructive. Thank you to the program committee for selecting this paper for the Best Paper Award.
    • SIGMOD 2012 Aaron Feng, Arun Kumar, Ben Recht, and I have a paper Towards a Unified Architecture for In-RDBMS Analytics accepted to SIGMOD 2012. The code is available here Bismark (so is the data) and the paper (full version). Thank you to Greenplum and Oracle for valuable discussions that helped us understand this problem!

  • ACL 2012 Feng Niu, Ce Zhang, Jude Shavlik, and I have a paper that builds on our experiences with DeepDive to study the impact of Big Data with Distant Supervision versus Crowd-Sourced data on the quality of relationship extraction from the Web. The paper title is Big Data versus the Crowd: Looking for Relationships in All the Right Places.
  • COLT 2012 Ben Recht and I have a new manuscript. The starting point of the manuscript is that practioners who run sequential random sampling algorithms (like stochastic gradient) sample from their data without replacement. In many cases, people have observed that sampling without replacement empircally converges to an optimal value faster than with-replacement sampling approaches. There is, however, a gap in our current theory as current theory provides faster convergence rates for with-replacement sampling than without replacement sampling. This paper takes a step toward closing this gap. We establish that without replacement methods do converge faster in several sampling models and propose a general inequality, a symmetrized (noncommutative) arithmetic-geometric-mean inequality, that would close this gap in many cases. CRC Coming Soon.
  • VLDB 2012. Accepted for publication in VLDB 2012 in Istanbul.
    • Arun Kumar and I have a paper, Probabilistic Management of OCR Data using an RDBMS. The paper is our first attempt to understand the implications of combining rich content models like OCR with the sophisticated querying capabilities of an RDBMS. The state-of-the-art models underlying these content models are probabilistic, e.g., OCRopus for Google Books. These models have very high quality, but the sheer size of these models can destroy query processing performance. Arun's idea is to use some ideas from statistics to compress the model; in turn, this allows him to trade run time for quality. Thank you to the Microsoft Jim Gray Systems Lab for supporting Arun's work. (Code and data here)
    • The MADlib Analytics Library or MAD Skills, the SQL. Joseph M. Hellerstein, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar, and I have a paper about the status of the status of MADLib, which is an awesome open-source effort led by Greenplum. Thank you, EMC!
Index by year
2012   2011   2010   2009   2008   2007   2006   2005   2004   2003   2002  


2012
  • Joseph M. Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleks Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, and Arun Kumar
  • The MADlib Analytics Library or MAD Skills, the SQL.
    PVLDB 2012
  • Arun Kumar, and Christopher Ré
  • Probabilistic Management of OCR using an RDBMS
    PVLDB 2012,
    [Full Version]
  • Fei Chen, Xixuan Feng, Christopher Ré, and Min Wang
  • Optimizing Statistical Information Extraction Programs Over Evolving Text
    ICDE 2012,
    [Full Version]
  • Christopher Ré, and Dan Suciu
  • Understanding cardinality estimation using entropy maximization
    ACM Trans. Database Syst. Volume 37, 2012, p. 6
  • Aaron Feng, Arun Kumar, Benjamin Recht, and Christopher Ré
  • Towards a Unified Architecture for In-Database Analytics
    SIGMOD Conference, 2012,
    [Full Version]
  • Hung Q. Ngo, Ely Porat, Atri Rudra, and Christopher Ré
  • Worst-case Optimal Join Algorithms
    PODS, 2012,
    Winner of the Best Paper Award
  • Ce Zhang, Feng Niu, Christopher Ré, and Jude Shavlik
  • Big Data versus the Crowd: Looking for Relationships in All the Right Places
    ACL, 2012,
  • Benajami Recht, and Christopher Ré
  • Toward a noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences
    COLT, 2012,
    [Full Version]

2011
  • Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch
  • Probabilistic Databases
    Morgan Claypool's Synthesis Lectures on Data Management, 2011,
  • Mehmet Levent Koc, and Christopher Ré
  • Incrementally maintaining classification using an RDBMS
    PVLDB Volume 4, 2011, p. 302-313
  • Feng Niu, Christopher Ré, AnHai Doan, and Jude W. Shavlik
  • Tuffy: Scaling up Statistical Inference in Markov Logic Networks using an RDBMS
    PVLDB Volume 4, 2011, p. 373-384
    [Full Version]
  • Eaman Jahani, Michael J. Cafarella, and Christopher Ré
  • Automatic Optimization for MapReduce Programs
    PVLDB Volume 4, 2011, p. 385-396
  • Nilesh N. Dalvi, Christopher Re, and Dan Suciu
  • Queries and materialized views on probabilistic databases
    J. Comput. Syst. Sci. Volume 77, 2011, p. 473-490
  • Benajamin Recht, and Christopher Ré
  • Parallel Stochastic Gradient Algorithms for Large-Scale Matrix Completion
    Optimization Online, 2011,
  • Feng Niu, Benajami Recht, Christopher Ré, and Stephen J. Wright
  • Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
    NIPS, 2011,
    [Full Version]
  • F. Niu, C. Zhang, C. Ré, and J. Shavlik
  • Felix: Scaling Inference for Markov Logic with an Operator-based Approach
    ArXiv e-prints 2011,

2010
  • Michael J. Cafarella, and Christopher Ré
  • Manimal: Relational Optimization for Data-Intensive Programs
    WebDB, 2010,
  • Benny Kimelfeld, and Christopher Ré
  • Transducing Markov Sequences
    PODS, 2010,
    Selected as one of the best papers in PODS 2010
  • Christopher Ré, and Dan Suciu
  • Understanding Cardinality Estimation using Entropy Maximization
    PODS, 2010,
    Selected as one of the best papers in PODS 2010
  • Julie Letchner, Christopher Ré, Magdalena Balazinska, and Matthai Philipose
  • Approximation Trade-Offs in a Markovian Stream Warehouse: An Empirical Study (Short Paper)
    ICDE, 2010,

2009
  • Christopher Ré
  • Managing Large-Scale Probabilistic Databases
    University of Washington, Seattle, 2009
    Winner of SIGMOD Jim Gray Thesis Award
  • Raghav Kaushik, Christopher Ré, and Dan Suciu
  • General Database Statistics Using Entropy Maximization
    DBPL, 2009, p. 84-99
    [Talk]
  • Katherine F. Moore, Vibhor Rastogi, Christopher Ré, and Dan Suciu
  • Query Containment of Tier-2 Queries over a Probabilistic Database
    Management of Uncertain Databases (MUD), 2009,
  • Julie Letchner, Christopher Ré, Magdalena Balazinska, and Matthai Philipose
  • Access Methods for Markovian Streams
    ICDE, 2009, p. 246-257
  • Arvind Arasu, Christopher Ré, and Dan Suciu
  • Large-Scale Deduplication with Constraints Using Dedupalog
    ICDE, 2009, p. 952-963
    [Talk]
    Selected as one of the best papers in ICDE 2009
  • Nilesh N. Dalvi, Christopher Ré, and Dan Suciu
  • Probabilistic databases: Diamonds in the dirt
    Commun. ACM Volume 52, 2009, p. 86-94
    [Full Version]
  • S. Manegold, I. Manolescu, L. Afanasiev, J. Feng, G. Gou, M. Hadjieleftheriou, S. Harizopoulos, P. Kalnis, K. Karanasos, D. Laurent, M. Lupu, N. Onose, C. Ré, V. Sans, P. Senellart, T. Wu, and D. Shasha
  • Repeatability & Workability Evaluation of SIGMOD 2009
    SIGMOD Record Volume 38, 2009, p. 40-43
  • Julie Letchner, Christopher Ré, Magdalena Balazinska, and Matthai Philipose
  • Lahar Demonstration: Warehousing Markovian Streams
    PVLDB Volume 2, 2009, p. 1610-1613
  • Christopher Ré, and Dan Suciu
  • The Trichotomy of HAVING Queries on a Probabilistic Database
    VLDB Journal 2009,

2008
  • Christopher Ré
  • Managing Probabilistic Data with Mystiq (Plenary Talk)
    Daghstul Seminar 08421: Uncertainty Management in Information Systems, 2008,
  • Christopher Ré, and Dan Suciu
  • Advances in Processing SQL Queries on Probabilistic Data
    Invited Abstract in INFORMS 2008, Simulation., 2008,
  • Ting-You Wang, Christopher Ré, and Dan Suciu
  • Implementing NOT EXISTS Predicates over a Probabilistic Database
    QDB/MUD, 2008, p. 73-86
  • Nodira Khoussainova, Evan Welbourne, Magdalena Balazinska, Gaetano Borriello, Garrett Cole, Julie Letchner, Yang Li, Christopher Ré, Dan Suciu, and Jordan Walke
  • A demonstration of Cascadia through a digital diary application
    SIGMOD Conference, 2008, p. 1319-1322
  • Christopher Ré, Julie Letchner, Magdalena Balazinska, and Dan Suciu
  • Event queries on correlated probabilistic streams
    SIGMOD Conference, 2008, p. 715-728
  • Christopher Ré, and Dan Suciu
  • Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do
    SUM, 2008, p. 5-18
  • Julie Letchner, Christopher Ré, Magdalena Balazinska, and Matthai Philipose
  • Challenges for Event Queries over Markovian Streams
    IEEE Internet Computing Volume 12, 2008, p. 30-36
  • Christopher Ré, and Dan Suciu
  • Approximate lineage for probabilistic databases
    PVLDB Volume 1, 2008, p. 797-808
    [Full Version][Talk]
    The version above corrects an error in the statement of lemma 3.7.
  • Magdalena Balazinska, Christopher Ré, and Dan Suciu
  • Systems aspects of probabilistic data management (Part I)
    PVLDB Volume 1, 2008, p. 1520-1521
    [Talk]
  • Magdalena Balazinska, Christopher Ré, and Dan Suciu
  • Systems aspects of probabilistic data management (Part II)
    PVLDB Volume 1, 2008, p. 1520-1521
    [Talk]

2007
  • Michael J. Cafarella, Christopher Ré, Dan Suciu, and Oren Etzioni
  • Structured Querying of Web Text Data: A Technical Challenge
    CIDR, 2007, p. 225-234
  • Christopher Re, and Dan Suciu
  • Management of data with uncertainties
    CIKM, 2007, p. 3-8
  • Christopher Ré, Dan Suciu, and Val Tannen
  • Orderings on Annotated Collections
    Liber Amicorum in honor of Jan Paredaens 60th Birthday, 2007,
  • Christopher Ré, and Dan Suciu
  • Efficient Evaluation of HAVING Queries
    DBPL, 2007, p. 186-200
    [Full Version][Talk]
  • Christopher Ré, Nilesh N. Dalvi, and Dan Suciu
  • Efficient Top-k Query Evaluation on Probabilistic Data
    ICDE, 2007, p. 886-895
    [Full Version][Talk]
  • Christopher Re, and Dan Suciu
  • Materialized Views in Probabilistic Databases for Information Exchange and Query Optimization
    VLDB, 2007, p. 51-62
    [Full Version][Talk]
  • Christopher Ré
  • Applications of Probabilistic Constraints (General Exam Paper)
    University of Washington TR#2007-03-03 2007,
  • Eytan Adar, and Christopher Ré
  • Managing Uncertainty in Social Networks
    IEEE Data Eng. Bull. Volume 30, 2007, p. 15-22

2006
  • Giorgio Ghelli, Christopher Ré, and Jér^ome Sim'eon
  • XQuery!: An XML Query Language with Side Effects
    EDBT Workshops, 2006, p. 178-191
  • Christopher Re, Jér^ome Sim'eon, and Mary F. Fern'andez
  • A Complete and Efficient Algebraic Compiler for XQuery
    ICDE, 2006, p. 14
  • Christopher Ré, Nilesh N. Dalvi, and Dan Suciu
  • Query Evaluation on Probabilistic Databases
    IEEE Data Eng. Bull. Volume 29, 2006, p. 25-31

2005
  • Chavdar Botev, Hubert Chao, Theodore Chao, Yim Cheng, Raymond Doyle, Sergey Grankin, Jon Guarino, Saikat Guha, Pei-Chen Lee, Dan Perry, Christopher Re, Ilya Rifkin, Tingyan Yuan, Dora Abdullah, Kathy Carpenter, David Gries, Dexter Kozen, Andrew C. Myers, David I. Schwartz, and Jayavel Shanmugasundaram
  • Supporting workflow in a course management system
    SIGCSE, 2005, p. 262-266
  • Jihad Boulos, Nilesh N. Dalvi, Bhushan Mandhani, Shobhit Mathur, Christopher Ré, and Dan Suciu
  • MYSTIQ: a system for finding more answers by using probabilities
    SIGMOD Conference, 2005, p. 891-893
  • Nathan Bales, James Brinkley, E. Sally Lee, Shobhit Mathur, Christopher Re, and Dan Suciu
  • A Framework for XML-Based Integration of Data, Visualization and Analysis in a Biomedical Domain
    XSym, 2005, p. 207-221

2004
  • Christopher Ré, Jim Brinkley, Kevin Hinshaw, and Dan Suciu
  • Distributed XQuery
    Workshop on Information Integration on the Web (IIWeb), 2004, p. 116-121

2003
  • Werner Vogels, and Christopher Ré
  • WS-Membership - Failure Management in a Web-Services World
    WWW (Alternate Paper Tracks), 2003,

2002
  • Werner Vogels, Christopher Ré, Robbert Renesse, and Kenneth P. Birman
  • A Collaborative Infrastructure for Scalable and Robust News Delivery
    ICDCS Workshops, 2002, p. 655-659

Christopher (Chris) Ré is an assistant professor in the department of Computer Sciences at the University of Wisconsin-Madison. The goal of his work is to enable users and developers to build applications that more deeply understand and exploit data. Chris received his PhD from the University of Washington, Seattle under the supervision of Dan Suciu. For his PhD work in the area of probabilistic data management, Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's papers have received four best papers or best-of-conference citations (best paper in PODS 2012 and best-of-conference in PODS 2010, twice, and one in ICDE 2009). Chris received an NSF CAREER Award in 2011 and was recently granted his first patent.

Download as text file




  • Oracle and Oracle Labs. Thank you to the Oracle Labs and the Oracle Analytics team for their generous support of the Hazy group's work! We are really excited to learn what customers need from in-database analytics. This will help support Arun's work. He and I are both very excited!
  • Greenplum/EMC. Thank you to Greenplum/EMC for their generous support of the Hazy group's work! We are really excited to learn from this collaboration -- and to push some of Aaron and Arun's stuff in to MADlib!, an awesome open-source library for scalable in-database analytics.
  • Office of Naval Research. Thank you to the ONR for support of my work under award no. N000141210041! This funding will allow our group to embark on a theoretical investigation of the foundations of building a large-scale, easy-to-use data-analysis system.
  • NSF CAREER. I recently received the NSF CAREER award (IIS-1054009). Thank you to the NSF for their generous support of Hazy.
  • IceCube. The Hazy group is extremely excited to announce funding for an exploratory data analysis project. The goal of the project is to apply Hazy's ideas to the problem of detecting neutrinos from the Big Bang in collaboration with the IceCube Neutrino Detector and Wisconsin Institutes for Discovery.
  • LogicBlox The Hazy group is excited to collaborate with LogicBlox! Thank you, LogicBlox, for your generous research gift to support our ongoing work on Tuffy and Felix.
  • DARPA DARPA's Machine Reading Program has the goal of understanding information expressed as free-form text. We are building a scalable engine to process a probabilistic logic called Markov Logic to support this effort.
  • Thank You! The Hazy group would like to thank our sponsors in the past and coming year: The Microsoft Jim Gray Lab, DARPA/AFOSR via SRI, the NSF, Google, Johnson Controls Inc., the University of Wisconsin-Madison, the Office of Naval Research, and Physical Layer Systems. In addition, we would like to thank our collaborators at the Wisconsin Institutes for Discovery, HP Labs-China, LogicBlox, Greenplum, Oracle and IBM.

Current Students

  • Victor Bittorf (Project: Data Analysis in IceCube)
  • Xixuan (Aaron) Feng (Project: Text Analytics with Hazy)
  • Arun Kumar (Project: Incorporating Speech and OCR data into Hazy)
  • Feng Niu (co-advised with AnHai Doan) (Project: Machine Reading)
  • Mark Wellons (co-advised with Ben Recht) (Project: Data Analysis in IceCube)
  • Ce Zhang (Project: Machine Reading)

Alumni

  • Josh Slauson
  • Vinod Ramachandran (MS, 2011, first employment: Oracle)
  • M. Levent Koc (MS, 2011, first employment: Google)
  • Balaji Gopalan (MS, 2010, first employment: Google)

Current Project: Hazy

The Hazy Website contains some of the initial components of our system, Hazy. Newly released components include Felix that contains our first cut of an automatic optimizer for MLN programs, and Bismark that allows users to specify machine learning tasks inside an RDBMS using incremental gradient methods.

Code, Videos, and Data

A demo of DeepDive and WiscI are here. The goal is to understand the challenges in building, scaling, and maintaining a probabilistic inference system in service of high-quality information systems. Both demonstrations enrich Wikipedia with structured data that is extracted from massive volumes of text, video, audio, and existing structured sources. There's also an overhyped video that we made to amuse ourselves!

Bismark is available now! (SIGMOD 2012)

Felix is available now! (includes Tuffy) This is a scalable system for Markov Logic that powers WiscI (it does deep analysis on 500M+ web pages and 200k+ videos.)

Jellyfish (Parallel Matrix Factorization) and Hogwild! (NIPS 2011) are available!

Staccato is here to store your OCR data! (VLDB 2012)

NB: There are VMs for each of my group's data analysis tools. Please let us know how we can make it easier for you to try out our stuff.


Completed Projects

MystiQ is a probabilistic relational database designed to handle imprecision resulting both from newer applications such as information extraction and social networking data and classical applications such as object reconciliation and data cleaning. The central theme is processing complex SQL queries on large amounts of probabilistic relational data. This work has developed techniques such as extensional plans for aggregates, multisimulation, materialized views of probabilistic data, processing of NOT EXISTS predicates, and approximate lineage. A recent overview of the system is in our upcoming SUM 2008 paper. For a broader, biased look at the state of the art, see our tutorial (powerpoint part I & II) that was delivered at VLDB 2008 in Auckland, New Zealand or the extended version of our CACM paper.


Lahar is a successor to the Peex project which is a part of the larger Markovian Streams Project. The goal of both projects is to manage data from the RFID ecosystem, which is a building wide RFID deployment at the Paul Allen Center at the University of Washington. The technical contribution of this work is a suite of algorithms and access methods to manage data in both near real-time and historical streams. This project is joint work with Julie Letchner and Prof. Magdalena Balazinska. For an overview, please see our article IEEE Journal of Internet Computing, Challenges for Event Queries over Markovian Streams. And for a more detailed account, see our ICDE 2009 research paper Access Methods for Markovian Streams or ICDE 2010 research paper, Approximation Trade-Offs in a Markovian Stream Warehouse: An Empirical Study

 


Dedupalog is a declarative language for specifying deduplication tasks. In our upcoming ICDE 2009 paper, Large-Scale Deduplication with Constraints using Dedupalog, we define a syntax and semantics for our new language. Further, we provide algorithms that can cluster massive datasets extremely fast, e.g., cluster all of citeseer in a minute or two. The technical key is an extremely scalable algorithm that we prove is a constant-factor approximation of the optimal for a large fragment of dedupalog programs. This is joint work with Dr. Arvind Arasu and Prof. Dan Suciu that was done while visiting the DMX group at Microsoft Research. This paper has been invited to a special issue of TKDE for the best papers in ICDE 2009.


Galax is an open-source implementation of XQuery 1.0, the W3C XML Query Language. My work on Galax included the design of the algebraic compiler which recovered classical optimizations, notably join optimizations, inside the full XQuery language. This work has continued without me to produce some very cool work at SIGMOD 2008.


XQuery! (read: XQuery-Bang) is a fully compositional update language that extends XQuery 1.0, the W3C XML Query Language. The contribution is recovering classical database optimizations (joins, cursors and indices) while at the same time providing imperative features (variable assignment). 


SilkRoute is a platform to translate XQuery to SQL in a performant and largely complete way. It allows users to publish their relational data effectively and easily. XBrain is a web-based application built on SilkRoute designed to allow researchers to query SIG’s Brain Mapping Database. The query language used is XQuery, and the resulting XML can be viewed directly or automatically transformed into HTML, CSV, or visualized on an image of brain regions.


  • Spring 13: CS764-1, Topics in Database Management Systems
  • Fall 12: None.
  • Spring 12: None.
  • Fall 11: CS 564-1,Database Management Systems
  • Spring 11: CS764-1, Topics in Database Management Systems
  • Fall 10: CS 564-1, Database Management Systems
  • Spring 10: CS 838-3, Probabilistic Data Management
  • Spring 10: CS 900-1, Presentation Seminar for Database Students
  • Fall 09: CS 564-2, Database Management Systems