Dept. of Computer Science & Engineering
AnHai Doan University of Washington
anhai@cs.washington.edu Box 352350, Seattle, WA 98195
www.cs.washington.edu/homes/anhai (206) 616 1842 (office phone)
(206) 543 2969 (office fax)

RESEARCH INTERESTS
Databases and Artificial Intelligence, with an emphasis on applying and extending machine learning techniques to address data integration over the Internet and across enterprises. In particular: schema matching, object identification across multiple sources, schema evolution, user interaction, learning with structured data, and text mining.

EDUCATION
Ph.D., Computer Science, University of Washington, June 2002, expected.
Dissertation: Learning to Translate between Structured Representations of Data
Advisors: Professors Alon Halevy and Pedro Domingos

M.S., Computer Science, University of Wisconsin-Milwaukee, 1996.
Dissertation: An Abstraction-based Approach to Decision-Theoretic Planning
Advisor: Professor Peter Haddawy

B.S., Computer Science (summa cum laude), Kossuth Lajos University, Hungary, 1993.

AWARDS AND HONORS
Graduate School Fellowship, University of Wisconsin, 1995-1996.
University Fellowship, Kossuth Lajos University, 1991-1993.
America-Hungary Exchange Program Scholarship, Kossuth Lajos University, 1993.
Red Diploma (equivalent to summa cum laude), Kossuth Lajos University, 1991.
Government Scholarship for Undergraduate Studies in Hungary, 1987.
Member of the six-person team representing Vietnam at the 27th Int. Math. Olympiad, 1986.

DISSERTATION: Learning to Translate between Structured Representations of Data
Finding semantic mappings between the schemas of two disparate data sources is a fundamental problem in many data management applications. My thesis presents LSD, a system that applies machine learning techniques to semi-automatically create such mappings. To find the mappings, LSD employs a multi-strategy learning approach: it applies multiple learners, then combines the learners' predictions using a meta-learner. As a result, LSD is extensible and can be easily customized to work on a particular application. Furthermore, the thesis describes GLUE, a system that builds on LSD to learn mappings between ontologies on the Semantic Web. The thesis also makes several contributions to the field of machine learning. To address learning problems brought about by schema and ontology mapping, the thesis presents a novel technique to classify semi-structured data, and an efficient method that employs relaxation labeling to classify interrelated entities.

RESEARCH EXPERIENCE
Research Assistant, University of Washington, 2000-present.
Advisors: Professors Alon Halevy and Pedro Domingos.
Ongoing research in schema matching, a critical step in many data management applications. Designed, implemented, and experimented with LSD, a system that employs and extends machine learning techniques to match the schemas of disparate data sources. Developed a novel learning method to classify semi-structured (e.g., XML) data.

Designed and experimented with GLUE, a system that builds on LSD to learn semantic mappings between ontologies on the Semantic Web. Developed an efficient learning method that employs relaxation labeling to classify interrelated ontology elements.

Research Assistant, University of Washington, 1999.
Advisor: Professor Alon Halevy.
Conducted research on query optimization for data integration. Developed and experimented with techniques to efficiently find the best query plans for a broad variety of plan utility classes.

Research Assistant, University of Washington, 1997-1998.
Advisor: Professor Steve Hanks.
Built a probabilistic AI planner that uses goal regression techniques to efficiently find the best plans in goal-oriented Markov Decision Process settings.

Research Intern, Rockwell Science Laboratory, Palo Alto, CA
Mentor: Dr. Denice Draper.
Implemented and experimented with exact and approximate methods for performing inference on large Bayesian networks.

Intern, Frontier Technologies Corp., Mequon, WI,1996.
Studied and designed encryption protocols for data transfer across networks. The intership resulted in a permanent job offer in the research and design division.

Research Assistant, University of Wisconsin, Milwaukee, WI, 1993-1996.
Advisor: Professor Peter Haddawy.
Designed, implemented, and experimented with DRIPS, a decision-theoretic AI planner that uses abstraction methods to quickly find the best plans. Applied DRIPS to clinical decision analysis. Developed theoretical frameworks for abstracting probabilistic actions and reasoning with probabilistic intervals.

Research Intern, Institute of Nuclear Research, Hungarian Academy of Sciences, 1991.
Developed a statistics and graphical toolkit to process, visualize, and find interesting patterns in large amount of data obtained from experiments in nuclear physics.

TEACHING AND MENTORING EXPERIENCE
Teaching Assistant, University of Washington, Fall 1996.
CSE 373, Data Structure and Algorithms, 70 students. Graded assignments and projects, helped students in office hours, assisted in preparing and grading exams, and gave several lectures.

Mentor, University of Washington, Fall 2001.
Jayant Madhavan, graduate student. Supervised a project on learning semantic mappings between ontologies.

Mentor, University of Washington, Spring 2000.
Leonid Tsybert, undergraduate student. Supervised an honors class project on applying decision tree techniques to the schema matching problem.

INVITED TALKS
Generic and Extensible Schema Matching with LSD.
IBM Almaden Research, San Jose, CA, July 2001.

Reconciling Schemas of Disparate Data Sources: A Machine-Learning Approach.
WatchMark Corp. (data mining for wireless services), Bellevue, WA, July 2001.

PROFESSIONAL ACTIVITIES AND SERVICES
Referee for INFORM Journal of Computing, 2001.

External referee for SIGMOD 2001, VLDB Journal 2001, WebDB 2001, WWW 2002, WISE 2001, AAAI 1996, and UAI 1995-1996.

Creator and developer of the UW online repository of benchmarks and data for schema and ontology matching.

Organized the Departmental weekly reading group on statistics and machine learning, 2000.

Volunteer, SIGMOD 1998 Conference, Seattle, WA.

Member of ACM, SIGMOD, AAAI, and IEEE.

PUBLICATIONS

PAPERS SUBMITTED OR IN PROGRESS
1. Learning to Map between Ontologies on the Semantic Web, A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Submitted to the World-Wide Web Conference (WWW). 2002.

2. Learning Complex Mappings between Database Schemas, A. Doan, P. Domingos, and A. Halevy. In progress.

INVITED PAPERS
3. Data Integration: A "Killer App" for Multi-Strategy Learning, A. Doan, P. Domingos, and A. Levy. Proceedings of the Workshop on Multi-Strategy Learning (MSL-00), 2000, Guimaraes, Portugal.

PAPERS IN REFEREED JOURNALS
4. Geometric Foundations for Interval-Based Probabilities, V. Ha, A. Doan, V. Vu, and P. Haddawy. Annals of Mathematics and Artificial Intelligence, 24 (1-4), 1998.

5. Decision-Theoretic Refinement Planning in Medical Decision Making: Management of Acute Deep Venous Thrombosis, P. Haddawy, A. Doan, and C. Kahn. Journal of Medical Decision Making, 1996.

PAPERS IN REFEREED CONFERENCES AND WORKSHOPS
6. Efficiently Ordering Query Plans for Data Integration, A. Doan and A. Halevy. Proceedings of the 18th IEEE Int. Conf. on Data Engineering (ICDE-2002) . To appear. PDF version

7. Reconciling Schemas of Disparate Data Sources: A Machine Learning Approach, A. Doan, P. Domingos, and A. Halevy. Proceedings of the ACM SIGMOD Conf. on Management of Data (SIGMOD-2001) . PDF version

8. Learning Source Descriptions for Data Integration, A. Doan, P. Domingos, and A. Levy. Proceedings of the 3rd International Workshop on the Web and Databases (WebDB-2000), pages 81-86, 2000. Dallas, TX: ACM SIGMOD.

9. Learning Mappings between Data Schemas , A. Doan, P. Domingos, and A. Levy. Proceedings of the AAAI-2000 Workshop on Learning Statistical Models from Relational Data, 2000, Austin, TX.

10. Efficiently Ordering Query Plans for Data Integration, A. Doan and A. Levy. The IJCAI-99 Workshop on Intelligent Information Integration, Stockholm, Sweden, 1999.

11. Sound Abstraction of Probabilistic Actions in the Constraint Mass Assignment Framework, A. Doan and P. Haddawy, Proceedings of the 12th National Conference on Uncertainty in AI (UAI-96), Portland, Oregon, 1996, pages 228-235.

12. Modeling Probabilistic Actions for Practical Decision-Theoretic Planning, A. Doan. Proceedings of the 3rd International Conference on AI Planning Systems (AIPS-96), Edinburgh, Scotland, May 1996.

13. Decision-Theoretic Planning for Clinical Decision Analysis, A. Doan, P. Haddawy, and C. Kahn. The Working Papers of AI in Medicine Spring Symposium, Stanford, 1996.

14. Efficient Decision-Theoretic Planning: Techniques and Empirical Analysis, P. Haddawy, A. Doan, and R. Goodwin. Proceedings of the 11th National Conference on Uncertainty in AI (UAI-95), Montreal, Canada, August 1995, pages 229-236.

15. Decision-Theoretic Refinement Planning: A New Method for Clinical Decision Analysis, A. Doan, P. Haddawy, and C. Kahn. Proceedings of the 19th Annual Symposium on Computer Applications in Medical Care (SCAMC-95), New Orleans, 1995, pages 299-303.

16. Abstracting Probabilistic Actions, P. Haddawy and A. Doan. Proceedings of the 10th National Conference on Uncertainty in AI (UAI-94), Seattle, July 1994.

OTHER PUBLICATIONS
17. Generating Macro Operators, A. Doan and P. Haddawy. AAAI Spring Symposium on Extended Theories of Action Representation, Stanford 1995.

18. Management of Acute Deep Venous Thrombosis of the Lower Extremities (abstract), C. Kahn, A. Doan. and P. Haddawy. American Roentgen Ray Society Meeting, San Diego, May 1996.

19. An Abstraction-Based Approach to Decision-Theoretic Planning for Partially Observable Metric Domains, A. Doan. Masters Thesis. Technical Report TR-95-12-01, Dept. of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee.

SKILLS
C/C++, Java, Lisp, Perl, XML, Unix, Windows, Latex.
Fluent in English, Vietnamese, and Hungarian.

REFERENCES
Available on request.