Yash Govind

I have been a Principal Software Engineer (Machine Learning) at Informatica since August 2020 and pushing the data integration research and development effort in the AI/ML research group on problems such as record linkage (entity matching, deduplication, etc.), schema matching, and more recently on the named entity recognition problem. I lead a few machine learning engineers where the goal is to push the data integration efforts into Informatica products like MDM, Claire engine, etc.

Prior to joining Informatica, I received my Ph.D. in the Dept. of Computer Sciences at the University of Wisconsin - Madison under the guidance of Professor AnHai Doan. During my Ph.D., I built a system "CloudMatcher" - a cloud-based service to do entity matching. This system was deployed and used by many real-world users/customers. The system enhanced the Magellan ecosystem that our group has been working upon from past few years. As a result, I was fortunate to be part of two academic awards 1) ACM SIGMOD research highlight award and 2) CACM research highlight award. My thesis work got significant commercial interest and as a result I co-founded GreenBay Technologies, Inc. in 2019 along with Professor AnHai Doan and Derek Paulsen. Greenbay Technologies, Inc. was later acquired by Informatica in 2020.

Before graduate school, I worked for 7 years in the insurance sector as a software engineer where my last stint was at Humana Inc. Green Bay - WI. In 2007, I graduated with a Bachelors degree in Computer Sciences from Pt. Ravi Shankar Shukla University.

Research Interests

Data management, data integration, entity matching, machine learning, crowdsourcing


Publications

Entity Matching Meets Data Science: A Progress Report from the Magellan Project

Yash Govind, P. Konda, and others

Paper (Industrial track)

SIGMOD 2019

CloudMatcher: A Hands-Off Cloud/Crowd Service for Entity Matching [Demonstration Proposal]

Yash Govind, E. Paulson, P. Nagarajan, Paul S. G.C., AnHai Doan, Y. Park, G. M. Fung, D. Conathan, M. Carter, M. Sun

Paper

VLDB 2018

Toward a System Building Agenda for Data Integration (and Data Science)

AnHai Doan, P. Konda, Paul S. G.C., A. Ardalan, J. R. Ballard, S. Das, Yash Govind, H. Li, P. Martinkus, S. Mudgal, E. Paulson, H. Zhang

Paper

IEEE Data Engineering Bulletin 2018

Magellan: Toward Building Entity Matching Management Systems [SIGMOD Research Highlight]

P. Konda, S. Das, Paul S. G.C., P. Martinkus, AnHai Doan, A. Ardalan, J. R. Ballard, Yash Govind, H. Li, F. Panahi, H. Zhang, Jeff Naughton, S. Prasad, G. Krishnan, R. Deep,

Paper

SIGMOD Research Highlight 2018

CloudMatcher: A Cloud/Crowd Service for Entity Matching

Yash Govind, E. Paulson, M. Ashok, Paul G.C., A. Hitawala, AnHai Doan, Y. Park, P. Peissig, E. LaRose, J. Badger

Paper Talk

BIGDAS @ KDD 2017

Human-in-the-Loop Challenges for Entity Matching: A Midterm Report

AnHai Doan, A. Ardalan, J.R. Ballard, S. Das, Yash Govind, P. Konda, H. Li, S. Mudgal, E. Paulson, Paul G.C., H. Zhang

Paper

HILDA @ SIGMOD 2017

Academics

Research Assistant

Department of Computer Sciences
Advisor: AnHai Doan

Towards building a cloud/crowd-based self-service framework to do Entity Matching(EM). A platform to support macro and micro services to perform different steps in the EM space.

March 2016 - Present

Project Assistant

School of Education - UW Madison

Worked on VidyaMap project by integrating digital text in design-based science classes using D3, Java and MySQL.

January 2016 - December 2017

Student Researcher

UW School of Medicine and Public Health

Backend developer for Macademia application at UW Carbone Cancer Center. Developed WCF services to extract publication data from PubMed.

September 2015 - January 2016

Research Assistant

Department of Computer Sciences

Worked on understanding the CoW (Copy on Write) behaviour of B-tree file system (Btrfs) and how isolation of data and metadata is done in Btrfs.

October 2014 - May 2015

Experience

Data Analytics Intern

American Family Insurance

Working to deploy/build the CloudMatcher solution at AmFam to match customers across multiple databases and solve other matching usecases in the insurance domain.

May 2018 - Present

Systems Software Intern (File System)

Huawei Technologies

Worked on extending the IceFS solution to isolate metadata in Ext3 file system dynamically based on the size of file system. Added space isolation: a cube(abstraction) will be allocated a specific number of block groups and changes can be done only by an administrator using an online tool. Enhanced user level tools (mke2fs, dumpe2fs, e2fsck, etc.).

May 2015 - August 2015

Project Lead

Humana Inc. (GreenBay, WI)

Worked on developing and maintaining web-services and solutions for agent reporting, commissions and bonuses as a backend developer. Developed ETL SSIS packages and did performance enhancement of SQL queries and packages.

October 2009 - August 2014

Software Developer

Tech Mahindra (Mahindra Satyam)

Worked as a Mainframe developer/production support analyst for CIGNA.

September 2007 - October 2009