pic.jpg Christopher Ré


Email: chrisre at cs.wisc.edu
Department of Computer Sciences
University of Wisconsin-Madison
Phone: (608) 263-5489
Office 4363


Papers and talks | Bio | Project Descriptions | Teaching

GAC Office Hours: TW 10a-11a.
CS564-2 Office Hours: M 10-11, W 11-12.

I am an assistant professor in the department of Computer Sciences at the University of Wisconsin-Madison. My interests are theoretical and practical problems in data management. Details of my work can be found here. I believe that the future of computing is in data management. If you agree, are an outstanding student, and are looking to begin graduate work, please send me an email.


Ongoing Project Descriptions

MystiQ is a probabilistic relational database designed to handle imprecision resulting both from newer applications such as information extraction and social networking data and classical applications such as object reconciliation and data cleaning. The central theme is processing complex SQL queries on large amounts of probabilistic relational data. This work has developed techniques such as extensional plans for aggregates, multisimulation, materialized views of probabilistic data, processing of NOT EXISTS predicates, and approximate lineage. A recent overview of the system is in our upcoming SUM 2008 paper. For a broader, biased look at the state of the art, see our tutorial (powerpoint part I & II) that was delivered at VLDB 2008 in Auckland, New Zealand or the extended version of our upcoming CACM paper.

 

MSH82_lahar_from_march_82_eruption_03-21-82_med.jpgLahar is a successor to the Peex project which is a part of the larger Markovian Streams Project. The goal of both projects is to manage data from the RFID ecosystem, which is a building wide RFID deployment at the Paul Allen Center at the University of Washington. The technical contribution of this work is a suite of algorithms and access methods to manage data in both near real-time and historical streams. This project is joint work with Julie Letchner and Prof. Magdalena Balazinska. For an overview, please see our article IEEE Journal of Internet Computing, Challenges for Event Queries over Markovian Streams. And for a more detailed account, see our ICDE 2009 research paper Access Methods for Markovian Streams, or check out our upcoming demo at VLDB 2009 in Lyon, France.

 

NB: We plan to publish the data from the RFID ecosystem soon, please check http://lahar.cs.washington.edu for details. We will also be donating this data to the pdbench project. If you have probabilistic/uncertain data, I encourage you to donate it to this great project!

 

 

 


Teaching

Fall 09: CS 564, Database Management Systems (Lecture 2)

 


Completed Project Descriptions

Dedupalog is a declarative language for specifying deduplication tasks. In our upcoming ICDE 2009 paper, Large-Scale Deduplication with Constraints using Dedupalog, we define a syntax and semantics for our new language. Further, we provide algorithms that can cluster massive datasets extremely fast, e.g., cluster all of citeseer in a minute or two. The technical key is an extremely scalable algorithm that we prove is a constant-factor approximation of the optimal for a large fragment of dedupalog programs. This is joint work with Dr. Arvind Arasu and Prof. Dan Suciu that was done while visiting the DMX group at Microsoft Research. This paper has been invited to a special issue of TKDE for the best papers in ICDE 2009.

 

Galax is an open-source implementation of XQuery 1.0, the W3C XML Query Language. My work on Galax included the design of the algebraic compiler which recovered classical optimizations, notably join optimizations, inside the full XQuery language. This work has continued without me to produce some very cool work at SIGMOD 2008.

 

 

XQuery! (read: XQuery-Bang) is a fully compositional update language that extends XQuery 1.0, the W3C XML Query Language. The contribution is recovering classical database optimizations (joins, cursors and indices) while at the same time providing imperative features (variable assignment). 

 

brainSilkRoute is a platform to translate XQuery to SQL in a performant and largely complete way. It allows users to publish their relational data effectively and easily. XBrain is a web-based application built on SilkRoute designed to allow researchers to query SIG’s Brain Mapping Database. The query language used is XQuery, and the resulting XML can be viewed directly or automatically transformed into HTML, CSV, or visualized on an image of brain regions.