I have been a Principal Software Engineer (Machine Learning) at Informatica since August 2020 and pushing the data integration research and development effort in the AI/ML research group on problems such as record linkage (entity matching, deduplication, etc.), schema matching, and more recently on the named entity recognition problem. I lead a few machine learning engineers where the goal is to push the data integration efforts into Informatica products like MDM, Claire engine, etc.
Prior to joining Informatica, I received my Ph.D. in the Dept. of Computer Sciences at the University of Wisconsin - Madison under the guidance of Professor AnHai Doan. During my Ph.D., I built a system "CloudMatcher" - a cloud-based service to do entity matching. This system was deployed and used by many real-world users/customers. The system enhanced the Magellan ecosystem that our group has been working upon from past few years. As a result, I was fortunate to be part of two academic awards 1) ACM SIGMOD research highlight award and 2) CACM research highlight award. My thesis work got significant commercial interest and as a result I co-founded GreenBay Technologies, Inc. in 2019 along with Professor AnHai Doan and Derek Paulsen. Greenbay Technologies, Inc. was later acquired by Informatica in 2020.
Before graduate school, I worked for 7 years in the insurance sector as a software engineer where my last stint was at Humana Inc. Green Bay - WI. In 2007, I graduated with a Bachelors degree in Computer Sciences from Pt. Ravi Shankar Shukla University.
Data management, data integration, entity matching, machine learning, crowdsourcing
Towards building a cloud/crowd-based self-service framework to do Entity Matching(EM). A platform to support macro and micro services to perform different steps in the EM space.
Worked on VidyaMap project by integrating digital text in design-based science classes using D3, Java and MySQL.
Backend developer for Macademia application at UW Carbone Cancer Center. Developed WCF services to extract publication data from PubMed.
Worked on understanding the CoW (Copy on Write) behaviour of B-tree file system (Btrfs) and how isolation of data and metadata is done in Btrfs.
Working to deploy/build the CloudMatcher solution at AmFam to match customers across multiple databases and solve other matching usecases in the insurance domain.
Worked on extending the IceFS solution to isolate metadata in Ext3 file system dynamically based on the size of file system. Added space isolation: a cube(abstraction) will be allocated a specific number of block groups and changes can be done only by an administrator using an online tool. Enhanced user level tools (mke2fs, dumpe2fs, e2fsck, etc.).
Worked on developing and maintaining web-services and solutions for agent reporting, commissions and bonuses as a backend developer. Developed ETL SSIS packages and did performance enhancement of SQL queries and packages.
Worked as a Mainframe developer/production support analyst for CIGNA.