- Dec 2017: Discussed misc issues about UW, CS, and living in Madison.
- Sep 2017: A short paper on a system building agenda for data integration.
- Sep 2017: Revised homepage to reflect recent work on data cleaning/integration and data science.
- Oct 2016: A talk on a system building agenda for data integration (and data science).
The Magellan system described below is an example of realizing this agenda for entity matching.
- Jul 2016: Launching Magellan,
a new project to build an end-to-end entity matching system.
- Old news
Research (Group's Homepage)
My work has charted new directions or bet on emerging directions that
I believe would become fundamental for data management. Current directions:
knowledge bases/graphs (2004-2012),
schema/ontology matching (2000-2010).
In between, from 2010-2014 I
some time in Silicon Valley, putting my work in these directions
to use, and learning a ton about doing things "in the wild".
- Data cleaning & integration:
I build end-to-end data integration systems as parts of the Python ecosystem of open-source data tools. I also leverage these systems to build cloud/crowd data integration services for lay users.
science: This direction is increasingly critical to the data
management community, yet no clear agenda exists today. I'm working
on an agenda that integrates research, system
building, education, and outreach. This agenda currently focuses
on data quality and builds on the above
work in data cleaning/integration.
- Quick links: DI agenda paper and talk, Magellan homepage and paper,
data science course,
BigGorilla repository of DI tools,
Selected Recent Publications
Google Scholar Entry)
Selected Awards and Honors
- Toward a System Building Agenda
for Data Integration, A. Doan, A. Ardalan, J. Ballard, S. Das,
Y. Govind, P. Konda, H. Li, E. Paulson, P. Suganthan G.C.,
H. Zhang. ArXiv 2017.
- CloudMatcher: A Cloud/Crowd Service for Entity Matching,
Y. Govind, E. Paulson, M. Ashok, P. Suganthan G.C., A. Hitawala, A. Doan, Y. Park, P. Peissig, E. LaRose, J. Badger.
BIGDAS Workshop @ KDD-17. slides
- Human-in-the-Loop Challenges for Entity Matching: A Midterm Report,
A. Doan, A. Ardalan, J. Ballard, S. Das, Y. Govind, P. Konda, H. Li, S. Mudgal, E. Paulson, P. Suganthan G.C., H. Zhang.
HILDA Workshop @ SIGMOD-17.
- Falcon: Scaling Up Hands-Off Crowdsourced Entity Matching to Build Cloud Services,
S. Das, P. Suganthan G.C., A. Doan, J. Naughton, G. Krishnan, R. Deep, E. Arcaute, V. Raghavendra, Y. Park.
SIGMOD-17. extended version, slides
- Towards Interactive
Debugging of Rule-Based Entity Matching, F. Panahi, W. Wu,
A. Doan, J. Naughton, EDBT-17.
- Magellan: Toward Building
Entity Matching Management Systems, P. Konda, S. Das,
P. Suganthan G.C., A. Doan, A. Ardalan, J. R. Ballard, H. Li,
F. Panahi, H. Zhang, J. Naughton, S. Prasad, G. Krishnan, R. Deep,
V. Raghavendra. VLDB-16. extended version, slides
- Magellan: Toward
Building Entity Matching Management Systems over Data Science
Stacks, P. Konda, S. Das, P. Suganthan G.C., A. Doan,
A. Ardalan, J. R. Ballard, H. Li, F. Panahi, H. Zhang, J. Naughton,
S. Prasad, G. Krishnan, R. Deep, V. Raghavendra. VLDB-16,
demo paper. Jupyter notebook & datasets for demo
- The Beckman Report on Database Research,
with many authors. Communications of the ACM, 2016. extended version
- Chimera: Large-Scale Classification using Machine Learning, Rules, and Crowdsourcing,
C. Sun, N. Rampalli, F. Yang, A. Doan. VLDB-14, industrial paper. slides
- Corleone: Hands-off Crowdsourcing for Entity Matching,
C. Gokhale, S. Das, A. Doan, J. Naughton, N. Rampalli, J. Shavlik, J. Zhu.
SIGMOD-14. slides, extended report
Recent classes include data science at the undergrad
levels, and CS 564 (Introduction to RDBMSs).
- I spent 3 years (2011-2014) setting up a
professional MS program and a
program in CS at UW-Madison (with help from Karu Sankaralingam,
Jeff Naughton, and Suman Banerjee). These programs have been highly
successful, enrolling hundreds of students.
- Selected recent community service
- member, SIGMOD Advisory Board,
- member, ICDE 10-Year Most Influential Paper Award Committee,
- associate editor, VLDB-16,
- co-chair, industrial program, VLDB-15,
- co-chair, Beckman meeting (with Mike Carey), 2013.
- chair, industrial program, SIGMOD-12
- I co-authored a data
integration textbook with Alon Halevy and Zack Ives in 2012.