Research
We are currently working on the following problems:
- Designing the overall architecture
- Implementing SmartCat v1, the first open-source version
- Table and column name expansion
- Generating textual descriptions and tags for tables
- Schema matching
- Discovering relationships such as related, unionable, joinable, lineage
- Taxonomy construction for browsing tables
- Keyword search over original and enriched metadata
- Curation with data stewards, in-house workers, and crowdsourcing workers
Publications
- Columbo: Expanding Abbreviated Column Names for Tabular Data Using Large Language Models, T. Cai, S. Sheen, A. Doan. EMNLP-25.
People and Funding
- Ting Cai, Minh Phan, Mark Tervo
- This project is supported by the NSF Medium Grant 2504787