Christopher (Chris) Ré is an assistant professor in the department of
Computer Sciences at the University of Wisconsin-Madison. The goal of
his work is to enable users and developers to build applications that
more deeply understand and exploit data. Chris received his PhD from
the University of Washington, Seattle under the supervision of Dan
Suciu. For his PhD work in the area of probabilistic data management,
Chris received the SIGMOD 2010 Jim Gray Dissertation Award. Chris's
papers have received four best papers or best-of-conference citations
(best paper in PODS 2012 and best-of-conference in PODS 2010, twice,
and one in ICDE 2009). Chris received an NSF CAREER Award in 2011 and
was recently granted his first patent.
Download as text file
Current Project: Hazy
The
Hazy Website contains
some of the initial components of our system, Hazy. Newly released
components
include
Felix that
contains our first cut of an automatic optimizer for MLN programs,
and
Bismark that
allows users to specify machine learning tasks inside an RDBMS using
incremental gradient methods.
Code, Videos, and Data

A demo of DeepDive and WiscI are
here. The goal
is to understand the challenges in building, scaling, and maintaining
a probabilistic inference system in service of high-quality
information systems. Both demonstrations enrich Wikipedia with
structured data that is extracted from massive volumes of text, video,
audio, and existing structured sources. There's also an overhyped
video that we made to amuse ourselves!
Bismark is available now! (SIGMOD 2012)
Felix is
available now!
(includes
Tuffy)
This is a scalable system for Markov Logic that powers WiscI (it does
deep analysis on 500M+ web pages and 200k+ videos.)
Staccato is here to store your OCR data! (VLDB 2012)
NB: There are VMs for each of my group's
data analysis tools. Please let us know how we can make it easier for
you to try out our stuff.
Completed Projects

Dedupalog is
a declarative language for specifying deduplication tasks. In our
upcoming ICDE 2009 paper,
Large-Scale
Deduplication with Constraints using Dedupalog, we define a
syntax and semantics for our new language. Further, we provide
algorithms that can cluster massive datasets extremely fast,
e.g., cluster all of citeseer in a minute or two. The
technical key is an extremely scalable algorithm that we prove is a
constant-factor approximation of the optimal for a large fragment of
dedupalog programs. This is joint work with Dr.
Arvind Arasu and
Prof. Dan Suciu that was done while visiting the
DMX group at
Microsoft Research. This paper has been invited to a special issue
of TKDE for the best papers in ICDE 2009.
Galax is an open-source
implementation of XQuery 1.0, the W3C XML Query Language. My
work on Galax included the
design of the algebraic compiler which recovered classical
optimizations, notably join optimizations, inside the full XQuery
language. This work has continued without me to produce some very
cool
work
at SIGMOD 2008.
XQuery! (read:
XQuery-Bang) is a fully compositional update language that extends
XQuery 1.0, the W3C XML Query Language. The contribution is
recovering classical database optimizations (joins, cursors and
indices) while at the same time providing imperative features
(variable assignment).
SilkRoute is a
platform to translate XQuery to SQL in a performant and largely
complete way. It allows users to publish their relational data
effectively and easily. XBrain is a web-based application built on
SilkRoute designed to allow researchers to query SIG’s Brain
Mapping Database. The query language used is XQuery, and the
resulting XML can be viewed directly or automatically transformed
into HTML, CSV, or visualized on an image of brain regions.