Photo credit: Hector Garcia-Molina

 

Theodoros (Theo) Rekatsinas

I am an assistant professor at the University of Wisconsin-Madison. I am a member of the UW-Madison Database Group.

Before joining UW, I was a Postdoc at Stanford with Chris Ré. I received my Ph.D. in Computer Science from the University of Maryland. My advisors were Amol Deshpande and Lise Getoor.

ML-first Data Integration and Enrichment: My group is exploring the fundamental connections between data integration and data enrichment with statistical learning and probabilistic inference. Our latest effort for ML-first data enrichment is HoloClean which is built on the idea of weak supervision and probabilistic inference; see our blog post.

Email: thodrek [at] cs.wisc.edu  /  Office: CS4361 @ Computer Sciences



News

  • Data cleaning is a structured prediction problem! Our work on approximate inference over structured instances with noisy categorical data is accepted at UAI 2019.
  • Our vision on using the noisy channel model to manage noisy data is available here.
  • Happy to be presenting our tutorial on data integration and machine learning at KDD 2019.
  • Is data augementaiton and weak supervision the answer to minimal effort data cleaning? Our work on Few-Shot Learning for Error Detection shows that the answer is yes! To appear in SIGMOD 2019!
  • The whitepaper on the vision around SysML is out!
  • Excited to be giving at talk at ETH on new formal frameworks for managing noisy databases. You can see the recorded talk here.
  • How to address data cleaning via a noisy channel model. Our work on Probabilistic Unclean Databases to appear in ICDT 2019!
  • The slides of our tutorial on the synergy between ML and data integration are available here.
  • Excited to release HoloClean as an open-source project! Check it out here!



Publications

NEW! Approximate Inference in Structured Instances with Noisy Categorical Observations
Alireza Heidari, Ihab F. Ilyas, and Theodoros Rekatsinas
UAI 2019 (to appear)

NEW! Unsupervised Functional Dependency Discovery for Data Preparation
Zhihan Guo and Theodoros Rekatsinas
ICLR, Learning from Limited Data Workshop 2019 [arxiv]

NEW! HoloDetect: Few-Shot Learning for Error Detection
Alireza Heidari, Joshua McGrath, Ihab F. Ilyas, and Theodoros Rekatsinas
SIGMOD 2019 (to appear)

NEW! A Formal Framework For Probabilistic Unclean Databases
Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré and Theodoros Rekatsinas
ICDT 2019

NEW! Data Integration and Machine Learning: A Natural Synergy
Xin Luna Dong and Theodoros Rekatsinas
Tutorial@SIGMOD 2018, @VLDB2018, and @KDD2019 (to appear)

Deep Learning For Entity Matching: A Design Space Exploration
Sidharth Mudgal, Han Li, Anhai Doan, Theodoros Rekatsinas, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra
SIGMOD 2018 Code is available here

Fonduer: Knowledge Base Construction from Richly Formatted Data
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis and Christopher Ré
SIGMOD 2018

HoloClean: Holistic Data Repairs with Probabilistic Inference
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas and Christopher Ré
VLDB 2017

SLiMFast: Guaranteed Results for Data Fusion and Source Reliability
Theodoros Rekatsinas, Manas Jogklekar, Hector Garcia-Molina, Aditya Parameswaran and Christopher Ré
ACM SIGMOD 2017

Forecasting Rare Disease Outbreaks from Open Source Indicators
Theodoros Rekatsinas, Saurav Ghosh, Sumiko Mekaru, Elaine Nsoesie, John Brownstein, Lise Getoor and Naren Ramakrishnan
Journal of Statistical Analysis and Data Mining, Best of SDM Special Issue, 2016

SourceSight: Enabling Effective Source Selection
Theodoros Rekatsinas, Amol Deshpande, Xin Luna Dong, Lise Getoor and Divesh Srivastava
ACM SIGMOD, 2016

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades
Xinran He, Theodoros Rekatsinas, James Foulds, Lise Getoor, and Yan Liu
International Conference on Machine Learning (ICML), 2015

StoryPivot: Comparing and Contrasting Story Evolution
Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, and Divesh Srivastava
ACM SIGMOD, 2015

SourceSeer: Forecasting Rare Disease Outbreaks Using Multiple Data Sources Best Paper Award
Theodoros Rekatsinas, Saurav Ghosh, Sumiko Mekaru, Elaine Nsoesie, John Brownstein, Lise Getoor and Naren Ramakrishnan
SIAM International Conference on Data Mining (SDM), 2015

Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration
Theodoros Rekatsinas, Xin Luna Dong, Lise Getoor and Divesh Srivastava
7th Biennial Conference on Innovative Data Systems Research (CIDR), 2015

Characterizing and selecting fresh data sources
Theodoros Rekatsinas, Xin Luna Dong and Divesh Srivastava
ACM SIGMOD, 2014

SPARSI: partitioning sensitive data amongst multiple adversaries
Theodoros Rekatsinas, Amol Deshpande and Ashwin Machanavajjhala
Proceedings of the VLDB Endowment Volume 6 Issue 13, 2013

Multi-relational Learning Using Weighted Tensor Decomposition with Modular Loss
Ben London, Theodoros Rekatsinas, Bert Huang and Lise Getoor
NIPS 2012 Workshop on Spectral Algorithms for Latent Variable Models

Local structure and determinism in probabilistic databases
Theodoros Rekatsinas, Amol Deshpande and Lise Getoor
ACM SIGMOD 2012

Fuzzy rule based neuro-dynamic programming for mobile robot skill acquisition on the basis of a nested multi-agent architecture Best Of Conference
John Karigiannis, Theodoros Rekatsinas and Costas S. Tzafestas
IEEE International Conference on Robotics and Biomimetics (ROBIO), 2010



Manuscripts

Adaptive Querying Strategies for Efficient Crowdsourced Data Extraction
Theodoros Rekatsinas, Amol Deshpande and Aditya Parameswaran, 2016

Quality-Aware Data Source Management
Theodoros Rekatsinas, Doctoral Dissertation, 2015



Students

Current PhD Students:

Current MS and Undergraduate Student:

  • Joshua McGrath
  • Paul Luh

Friends and Collaborators:

  • Alireza Heidari (Waterloo)
  • Han Li (UW-Madison)
  • Calvin Smith (UW-Madison)
  • Sen Wu (Stanford)

Alumni:

  • Jordan Vonderwell (BS 2019, Google)
  • Sidharth Mudgal (MS 2018, Amazon)
  • Sherine Zhang (BS 2018, Stanford for MS)



Teaching

CS639: Data Management for Data Science, Spring 2019

CS839: Probabilisitc Graphical Models, Fall 2018

CS839: Data Management for Machine Learning, Spring 2018

CS564: Database Management Systems, Fall 2017



Service

Organizing Committee: ICDE 2019, SysML2019

PC-Member: SIGMOD 2017-2019, VLDB 2017, ICDE 2018, NIPS 2015-2017, ICML 2018, IJCAI 2016, CIKM 2017-2018

Reviewer: ICML, SIGMOD, VLDB, WSDM, WWW, TKDE, TODS, TSAS, SIGMOD Record