Current Research

I am working with Professor Jerry Zhu on several research projects in statistical machine learning and natural language processing. Related publications can be found here.

Semi-Supervised Learning

We are continuously investigating various types of semi-supervised learning. Currently, we are experimenting with online (i.e., incremental) semi-supervised learning with kernels. We recently published a paper (with Stephen Wright) introducing a method for binary and multi-class classification that uses dissimilarity constraints (as opposed to the typical similarity information) to improve accuracy in the face of limited labeled data (to appear at AISTATS 2007). Similarly, we have developed a method that uses order preferences to improve semi-supervised regression (UW tech report).

Text-to-Picture Synthesis

We are also actively working on a text-to-picture project in which our goal is to be able to automatically produce a set of images that captures the main meaning behind a piece of arbitrary text. Our current approach involves several natural language processing and computer vision techniques to select keyphrases in the text that should appear in the picture and find appropriate images to represent each of these phrases.

Sentiment Analysis

We are investigating statistical machine learning approaches to the sentiment classification and sentiment rating problems. Our first paper introduces a graph-based semi-supervised learning algorithm for doing sentiment rating inference. This work was based on 4 corpora of movie review data. This appeared at the TextGraphs workshop at the HLT-NAACL 2006 conference. We have also built a corpus of product reviews from Amazon.com from four product categories, and we are now looking at ways to do cross-domain analysis.

Past Research

During my first year at UW-Madison, I had an independent study with Professor Michael Ferris in which I experimented with implementing various knowledge-based support vector machine formulations in Matlab. This was based on recent work by Olvi Mangasarian and Jude Shavlik in the bioinformatics domain (e.g., Wisconsin Breast Cancer Database).

As an undergraduate at Amherst College, I spent a year researching genetic algorithms. This work culminated in a senior honors thesis in which I developed a genetic algorithm to find near-optimal schedules for a soccer league. Specifically, I worked with data and prior knowledge acquired from the New England Small College Athletic Conference (NESCAC) to build a real-world system capable of creating schedules that optimize various criteria (i.e., minimizing total traveling distance by all teams, ensuring that big rivalry games occur on key dates). The work was a success, as the schedules produced were desirable to league administrators.