Photo credit: Hector Garcia-Molina

 

Theodoros (Theo) Rekatsinas

I am an assistant professor at the University of Wisconsin-Madison. I am a member of the UW-Madison Database Group.

Before joining UW, I was a Postdoc at Stanford with Chris Ré. I received my Ph.D. in Computer Science from the University of Maryland. My advisors were Amol Deshpande and Lise Getoor.

ML-first Data Integration and Enrichment: My group is exploring the fundamental connections between data integration and data enrichment with statistical learning and probabilistic inference. Our latest effort for ML-first data enrichment is HoloClean which is built on the idea of weak supervision and probabilistic inference; see our blog post. Systems like HoloClean transition from logic to probabilities in a way similar to the AI-revolution in the eighties.

Email: thodrek [at] cs.wisc.edu  /  Office: CS4361 @ Computer Sciences



News

  • Excited to release HoloClean as an open-source project! Check it out here!
  • Learn about ML-first data integration @SIGMOD2018 and @VLDB2018: Our tutorial with Xin Luna Dong, explains the synergies between data integration and machine learning!
  • Received NSF CRII to study statistical learning methods for data cleaning!
  • Thanks to Amazon for their generous support!
  • DeepMatcher@Sigmod 2018: Sid and Han explore deep learning for entity matching! One model to rule them all or is it all about tradeoffs? Preprint coming soon!
  • New manuscript on a noisy channel model for unclean relational data
  • Fonduer @ SIGMOD 2018: Sen shows how to build knowledge bases from richly formatted data!



Publications

NEW! A Formal Framework For Probabilistic Unclean Databases
Christopher De Sa, Ihab F. Ilyas, Benny Kimelfeld, Christopher Ré and Theodoros Rekatsinas
Manuscript, 2018

NEW! Data Integration and Machine Learning: A Natural Synergy
Xin Luna Dong and Theodoros Rekatsinas
Tutorial@SIGMOD 2018 and @VLDB2018 (To Appear)

NEW! Deep Learning For Entity Matching: A Design Space Exploration
Sidharth Mudgal, Han Li, Anhai Doan, Theodoros Rekatsinas, Youngchoon Park, Ganesh Krishnan, Rohit Deep, Esteban Arcaute, and Vijay Raghavendra
SIGMOD 2018 (To Appear)

NEW! Fonduer: Knowledge Base Construction from Richly Formatted Data
Sen Wu, Luke Hsiao, Xiao Cheng, Braden Hancock, Theodoros Rekatsinas, Philip Levis and Christopher Ré
SIGMOD 2018 (To Appear)

HoloClean: Holistic Data Repairs with Probabilistic Inference
Theodoros Rekatsinas, Xu Chu, Ihab F. Ilyas and Christopher Ré
VLDB 2017

SLiMFast: Guaranteed Results for Data Fusion and Source Reliability
Theodoros Rekatsinas, Manas Jogklekar, Hector Garcia-Molina, Aditya Parameswaran and Christopher Ré
ACM SIGMOD 2017

Forecasting Rare Disease Outbreaks from Open Source Indicators
Theodoros Rekatsinas, Saurav Ghosh, Sumiko Mekaru, Elaine Nsoesie, John Brownstein, Lise Getoor and Naren Ramakrishnan
Journal of Statistical Analysis and Data Mining, Best of SDM Special Issue, 2016

SourceSight: Enabling Effective Source Selection
Theodoros Rekatsinas, Amol Deshpande, Xin Luna Dong, Lise Getoor and Divesh Srivastava
ACM SIGMOD, 2016

HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades
Xinran He, Theodoros Rekatsinas, James Foulds, Lise Getoor, and Yan Liu
International Conference on Machine Learning (ICML), 2015

StoryPivot: Comparing and Contrasting Story Evolution
Anja Gruenheid, Donald Kossmann, Theodoros Rekatsinas, and Divesh Srivastava
ACM SIGMOD, 2015

SourceSeer: Forecasting Rare Disease Outbreaks Using Multiple Data Sources Best Paper Award
Theodoros Rekatsinas, Saurav Ghosh, Sumiko Mekaru, Elaine Nsoesie, John Brownstein, Lise Getoor and Naren Ramakrishnan
SIAM International Conference on Data Mining (SDM), 2015

Finding Quality in Quantity: The Challenge of Discovering Valuable Sources for Integration
Theodoros Rekatsinas, Xin Luna Dong, Lise Getoor and Divesh Srivastava
7th Biennial Conference on Innovative Data Systems Research (CIDR), 2015

Characterizing and selecting fresh data sources
Theodoros Rekatsinas, Xin Luna Dong and Divesh Srivastava
ACM SIGMOD, 2014

SPARSI: partitioning sensitive data amongst multiple adversaries
Theodoros Rekatsinas, Amol Deshpande and Ashwin Machanavajjhala
Proceedings of the VLDB Endowment Volume 6 Issue 13, 2013

Multi-relational Learning Using Weighted Tensor Decomposition with Modular Loss
Ben London, Theodoros Rekatsinas, Bert Huang and Lise Getoor
NIPS 2012 Workshop on Spectral Algorithms for Latent Variable Models

Local structure and determinism in probabilistic databases
Theodoros Rekatsinas, Amol Deshpande and Lise Getoor
ACM SIGMOD 2012

Fuzzy rule based neuro-dynamic programming for mobile robot skill acquisition on the basis of a nested multi-agent architecture Best Of Conference
John Karigiannis, Theodoros Rekatsinas and Costas S. Tzafestas
IEEE International Conference on Robotics and Biomimetics (ROBIO), 2010



Manuscripts

Adaptive Querying Strategies for Efficient Crowdsourced Data Extraction
Theodoros Rekatsinas, Amol Deshpande and Aditya Parameswaran, 2016

Quality-Aware Data Source Management
Theodoros Rekatsinas, Doctoral Dissertation, 2015



Students
Advising: Meghana Moorthy Bhat, Zhihan Guo, Joshua McGrath, Jordan Vonderwell, Sherine Zhang
Friends and Collaborators: Alireza Heidar (Waterloo), George Michalopoulos (Waterloo), Sen Wu (Stanford)


Teaching

CS839: Probabilisitc Graphical Models, Coming in Fall 2018

CS839: Data Management for Machine Learning, Spring 2018

CS564: Database Management Systems, Fall 2017



Service

PC Demo Chair: ICDE 2019

PC-Member: SIGMOD 2017-2019, VLDB 2017, ICDE 2018, NIPS 2015-2017, ICML 2018, IJCAI 2016, CIKM 2017-2018

Reviewer: ICML, SIGMOD, VLDB, WSDM, WWW, TKDE, TODS, TSAS, SIGMOD Record