Selected Publications by Topic
[by year]
- Semi-Supervised Learning
- Natural Language Processing
- Human Computer Interfaces
- Applications of Statistical Machine Learning
Semi-Supervised Learning
-
Andrew B. Goldberg, Ming Li, and Xiaojin Zhu.
Online Manifold Regularization: A New Learning Setting and Empirical Study.
In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2008.
Semi-supervised learning from infinite labeled and unlabeled data sequentially. Extends online learning to cases where most input examples are unlabeled. The key is stochastic gradient descent on any convex semi-supervised risk function, with two practical approximations for manifold regularization: buffering and random projection tree.
[pdf]
-
Xiaojin Zhu, Timothy Rogers, Ruichen Qian, and Chuck Kalish.
Humans perform semi-supervised classification too.
In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
We show that humans determine class boundaries using both labeled and unlabeled data, just like certain semi-supervised machine learning models.
[pdf]
-
Xiaojin Zhu and Andrew Goldberg.
Kernel regression with order preferences.
In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
A linear program to incorporate order preferences ("I think the target value is larger at x1 than at x2") as regularizer in regression.
[pdf]
[TR 1578 version]
-
Andrew Goldberg, Xiaojin Zhu, and
Stephen Wright.
Dissimilarity in graph-based semi-supervised classification.
In Eleventh International Conference on Artificial Intelligence and
Statistics (AISTATS), 2007.
A convex quadratic program to incorporate cannot-links (two examples should have different labels) into binary and multiclass classification.
Extends graph-based semi-supervised learning to mixed graphs.
[pdf]
-
Xiaojin Zhu, Jaz Kandola, John Lafferty, and
Zoubin Ghahramani.
Graph kernels by spectral transforms.
In O. Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning.
MIT Press, 2006.
Keep the eigenvectors of a graph Laplacian, but optimize the eigenvalues under the constraints that smoother eigenvectors should have larger eigenvalues, to maximize kernel-target alignment on training data. Extended version of NIPS05 paper.
[pdf]
-
Andrew Goldberg and Xiaojin Zhu.
Seeing stars when there aren't many stars: Graph-based semi-supervised learning for sentiment categorization. In HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing, New York, NY, 2006.
Do people like a movie? We extend the classic Pang&Lee movie sentiment paper to semi-supervised learning by building a graph over labeled and unlabeled movie reviews.
[pdf]
-
Xiaojin Zhu.
Semi-supervised learning literature survey.
Technical Report 1530, Department of Computer Sciences, University
of Wisconsin, Madison, 2005.
We review the literature on semi-supervised learning, i.e., machine learning from both labeled and unlabeled data. This online paper is updated frequently to incorporate the latest development in the field.
[pdf]
-
Xiaojin Zhu.
Semi-Supervised Learning with Graphs.
PhD thesis, Carnegie Mellon University, 2005. CMU-LTI-05-192.
[pdf]
My Ph.D. thesis on graph-based semi-supervised learning, including label propagation, Gaussian random fields and harmonic functions, semi-supervised active learning, graph hyperparameter learning, kernel matrices from graph Laplacian, sparse representation and so on.
-
Xiaojin Zhu and John Lafferty.
Harmonic mixtures:
combining mixture models and graph-based methods for inductive and scalable
semi-supervised learning.
In The 22nd International Conference on Machine Learning (ICML). ACM Press, 2005.
Making graph-based semi-supervised learning faster and handling unseen data, by first modeling data with a mixture model (e.g., GMM), then treating mixture components (instead of individual data points) as nodes in the graph.
[pdf][small teapot data (.mat)]
-
Maria-Florina Balcan, Avrim Blum, Patrick
Pakyan Choi, John Lafferty, Brian Pantano, Mugizi Robert Rwebangira, and
Xiaojin Zhu.
Person identification in webcam images: An application of
semi-supervised learning.
In ICML 2005 Workshop on Learning with Partially Classified Training Data, 2005.
Use abundant unlabeled frames to improve people recognition by Webcam. The graph over Webcam image frames uses close-in-time edges, foreground color histogram edges (people with similar apparel), and similar-face edges.
[pdf]
[FreeFoodCam dataset (.tgz 335MB)]
-
Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani,
and John Lafferty.
Nonparametric transforms of graph kernels for semi-supervised
learning. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors,
Advances in Neural Information Processing Systems (NIPS) 17. MIT
Press, Cambridge, MA, 2005.
Keep the eigenvectors of a graph Laplacian, but optimize the eigenvalues under the constraints that smoother eigenvectors should have larger eigenvalues, to maximize kernel-target alignment on training data.
[pdf]
[Matlab code & data]
[QP notes]
-
John Lafferty, Xiaojin Zhu, and Yan Liu.
Kernel conditional random fields: Representation and clique selection.
In The 21st International Conference on Machine Learning (ICML),
2004.
We kernelize Conditional Random Fields, which is an alternative to Maximum Margin Markov Networks. We propose greedy clique selection in the dual for sparse representation.
[ps]
[pdf]
-
Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty.
Semi-supervised learning using Gaussian fields and harmonic functions.
In The 20th International Conference on Machine Learning (ICML),
2003.
A graph-based semi-supervised learning algorithm that creates a graph over labeled and unlabeled examples. More similar examples are connected by edges with higher weights. The intuition is for the labels to propagate on the graph to unlabeled data. The solution can be found with simple matrix operations, and has strong connections to spectral graph theory.
[ps.gz]
[pdf]
[Matlab code]
[data]
-
Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani.
Combining active learning and semi-supervised learning using Gaussian fields
and harmonic functions. In ICML 2003 workshop on The Continuum from
Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
Actively selects an unlabeled point to ask for the label, by minimizing an estimated classification error (instead of simply picking the most ambiguous unlabeled point). Once the label is obtained, efficiently retrain the classifier with both labeled and unlabeled data.
[ps.gz]
[pdf]
[Matlab code]
-
Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani.
Semi-supervised learning: From Gaussian fields to Gaussian processes. Technical Report CMU-CS-03-175, Carnegie Mellon University, 2003.
We establish the connection between the inverse graph Laplacian and kernel Gram matrix, and learn hyperparameters for graph weights with evidence maximization. However, this is not a true Gaussian process since unseen points (not in training labeled and unlabeled data) are not handled well.
[ps.gz]
[pdf]
-
Xiaojin Zhu and Zoubin Ghahramani.
Learning from labeled and unlabeled data with label propagation.
Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
Precursor of the ICML03 paper. The intuition of label propagation is introduced, together with an iterative algorithm which amounts to relaxation method.
[ps.gz]
[pdf]
-
Xiaojin Zhu and Zoubin Ghahramani.
Towards semi-supervised classification with Markov random fields. Technical Report
CMU-CALD-02-106, Carnegie Mellon University, 2002.
Yet another precursor of the ICML03 paper. The graph is defined, but as a Boltzmann machines (discrete states) rather than the later Gaussian random fields (continuous states). Inference, with MCMC, is difficult.
[ps.gz]
[pdf]
Natural Language Processing
-
Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, and Robert Nowak.
Learning bigrams from unigrams.
In The 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), 2008.
If I give you a text document in bag-of-word (unigram count vector) format, you will not know the order between words.
What if I give you 10,000 documents, each in bag-of-word format?
Surprisingly, we can partially recover a bigram language model just from these bag-of-word documents.
[pdf]
-
Xiaojin Zhu, Andrew Goldberg, Jurgen Van
Gael, and David Andrzejewski.
Improving diversity in ranking using absorbing random walks. In Human Language Technologies: The Annual Conference
of the North American Chapter of the Association for Computational Linguistics
(NAACL-HLT), 2007.
A ranking algorithm (GRASSHOPPER) that is similar to PageRank but encourages diversity in top ranked items, by turning already ranked items into absorbing states to penalize remaining similar items.
[pdf]
[code]
-
Jurgen Van Gael and Xiaojin Zhu.
Correlation clustering for crosslingual link detection.
In International Joint Conference on Artificial Intelligence (IJCAI), 2007.
Cluster news articles in different languages by event. A practical implementation of correlation clustering that involves linear program chunking.
[pdf][data]
-
Gregory Druck, Chris Pal, Xiaojin Zhu, and Andrew McCallum.
Semi-supervised classification with hybrid generative/discriminative methods.
In The Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2007.
[pdf]
-
Jordan Boyd-Graber, David Blei, and Xiaojin Zhu.
A topic model for word sense disambiguation. In
Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL), 2007.
[pdf]
-
SaiSuresh Krishnakumaran and Xiaojin
Zhu.
Hunting elusive metaphors using lexical resources.
In NAACL 2007
Workshop on Computational Approaches to Figurative Language, 2007.
Identify "The soldier is a lion" as a metaphor by noting the lack of WordNet hyponym relationship between "soldier" and "lion".
Extends to verb-noun or adjective-noun pairs using Google Web 1T bigram counts.
[pdf]
[data]
-
Ronald Rosenfeld, Stanley Chen, and Xiaojin
Zhu. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration.
Computers Speech and Language, 15(1), 2001.
Directly model the probability of a sentence with an exponential model, instead of using the chain rule on words. Can use arbitrary, long range features.
[pdf]
-
Xiaojin Zhu and Ronald Rosenfeld.
Improving trigram language modeling with the World Wide Web. In Proceedings of
the International Conference on Acoustics, Speech and Signal Processing
(ICASSP), 2001.
Estimating n-gram probabilities by submitting word sequences as phrase queries to search engines.
[pdf]
[tech report version CMU-CS-00-171 ps]
-
Xiaojin Zhu, Stanley F. Chen, and Ronald
Rosenfeld.
Linguistic features for whole sentence maximum entropy language
models. In Proceedings of the 5th European Conference on Speech Communication
and Technology (Eurospeech), 1999.
Parse a real corpus and a trigram-generated corpus using a shallow parser. Identify features that behave differently in the two corpora. Use them to build a better language model.
[ps]
Human Computer Interfaces
-
Andrew B. Goldberg, Xiaojin Zhu, Charles R. Dyer, Mohamed Eldawy, and Lijie Heng.
Easy as ABC? Facilitating pictorial communication via semantically enhanced layout.
In Twelfth Conference on Computational Natural Language Learning (CoNLL), 2008.
If you have pictures for individual words in a sentence, how do you compose them to best convey the meaning of the sentence? We learn an "ABC" layout using semantic role labeling and conditional random fields, and conduct a user study.
[pdf]
-
Xiaojin Zhu, Andrew Goldberg, Mohamed Eldawy, Charles Dyer, and Bradley Strock.
A text-to-picture synthesis system for augmenting communication.
In The Integrated Intelligence Track of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
Synthesizing a picture from general, unrestricted natural language text, to convey the gist of the text.
[pdf]
-
Stefanie Shriver, Arthur Toth, Xiaojin
Zhu, Alex Rudnicky, and Roni Rosenfeld.
A unified design for human-machine
voice interaction. In Human Factors in Computing Systems (CHI).
ACM Press, 2001.
In order for humans to use speech interfaces, they might need to learn how to speak to machines.
[ps]
-
Ronald Rosenfeld, Xiaojin Zhu, Stefanie
Shriver, Arthur Toth, Kevin Lenzo, and Alan Black.
Towards a universal
speech interface. In International Conference on Spoken Language Processing
(ICSLP), 2000.
A general speech input paradigm that attempts to structurize human speech to facilitate speech recognition.
[pdf]
-
Xiaojin Zhu, Jie Yang, and Alex Waibel.
Segmenting hands of arbitrary color.
In Fourth IEEE International Conference on
Automatic Face and Gesture Recognition, 2000.
We model the color histogram of a scene by a Gaussian mixture model, one of the mixture component is the hand.
[ps.gz]
-
Jie Yang, Xiaojin Zhu, Ralph Gross, John
Kominek, Yue Pan, and Alex Waibel.
Multimodal people ID for multimedia
meeting browser. In The Seventh ACM International Multimedia Conference,
1999.
Use face recognition, speaker identification, color histogram, and sound direction to identify meeting participants.
[link]
Applications of Statistical Machine Learning
-
Xiaojin Zhu, Michael Coen, Shelley Prudom, Ricki Colman, and Joseph Kemnitz.
Online learning in monkeys.
In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008.
(short paper)
We compare rhesus monkeys playing the Wisconsin Card Sorting Task to online machine learning algorithms.
[pdf]
-
Nathan Rosenblum, Xiaojin Zhu, BartonMiller, and Karen Hunt.
Learning to analyze binary computer code.
In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008.
An extended version of the NIPS07 workshop paper, including high throughput computing and a formal analysis of self-repairing disassembly.
[pdf]
-
Nathan Rosenblum, Xiaojin Zhu, Barton Miller, and Karen Hunt.
Machine Learning-Assisted Binary Code Analysis.
In NIPS workshop on Machine Learning in Adversarial Environments for Computer Security, 2007.
Identify function entry points in binary code using Markov Random Fields on both local instruction patterns and global control flow structures
.
-
David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu.
Statistical debugging using latent topic models.
In Proceedings of the 18th European Conference on Machine Learning (ECML), 2007.
Representing software execution traces using "bag-of-words", where the words are instrumented probes in the software. A Delta-Latent-Dirichlet-Allocation (ΔLDA) model to identify weak latent topics that correspond to distinct software bugs.
[pdf]
-
Mariyam Mirza, Joel Sommers, Paul Barford,
and Xiaojin Zhu. A machine learning approach to TCP throughput prediction.
In The International Conference on Measurement and Modeling of Computer
Systems (ACM SIGMETRICS), 2007.
Apply Support Vector Regression to predict Internet file transfer rate from measurable features of the network.
[pdf]
[back to Xiaojin Zhu's homepage]