Selected Publications by Topic

[by year]

Semi-Supervised Learning
Latent Topic Models
Machine Learning for Cognitive Science
Natural Language Processing
Human Computer Interfaces
Applications of Statistical Machine Learning

Semi-Supervised Learning

Andrew Goldberg, Xiaojin Zhu, Benjamin Recht, Junming Sui, and Robert Nowak. Transduction with matrix completion: Three birds with one stone. In Advances in Neural Information Processing Systems (NIPS) 24. 2010.
Find it difficult to do transductive learning on multi-label data with many missing features and missing labels? Let matrix completion help. [pdf]

Xiaojin Zhu and Andrew B. Goldberg. Introduction to Semi-Supervised Learning. Morgan & Claypool, 2009.
A short, self-contained introductory book to semi-supervised learning. For advanced undergraduates, entry-level graduate students and researchers in Computer Science, Electrical Engineering, Statistics, Psychology, etc. You may already have access to the book through your institution -- check "Access" on the link. [link]

Xiaojin Zhu. Semi-Supervised Learning. Encyclopedia entry in Claude Sammut and Geoffrey Webb, editors, Encyclopedia of Machine Learning. Springer, to appear.
A concise, technical summary of semi-supervised learning. [pdf]

Andrew B. Goldberg and Xiaojin Zhu. Keepin' it real: Semi-supervised learning with realistic tuning. In NAACL 2009 Workshop on Semi-supervised Learning for NLP, 2009.
Cross-validation for accuracy is effective for semi-supervised learning on labeled data as small as 10 items. [pdf]

Andrew Goldberg, Xiaojin Zhu, Aarti Singh, Zhiting Xu, and Robert Nowak. Multi-manifold semi-supervised learning. In Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
What if the data consists of multiple, intersecting manifolds? We cut the ambient space into pieces by clustering unlabeled data using a Hellinger distance metric sensitive to manifold dimensionality, orientation, and density. Within each piece, we then perform supervised learning using the labeled data. [pdf]

Aarti Singh, Robert Nowak, and Xiaojin Zhu. Unlabeled data: Now it helps, now it doesn't. In Advances in Neural Information Processing Systems (NIPS) 22, 2008.
Is semi-supervised learning (SSL) better than supervised learning (SL) in theory? We prove that as the distance between two classes gets closer and eventually overlap, there are several distinct phases in which SSL is better than SL, while in other phases SSL is no better than SL. [preprint: pdf | extended tech report | Errata]

Andrew B. Goldberg, Ming Li, and Xiaojin Zhu. Online Manifold Regularization: A New Learning Setting and Empirical Study. In The European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), 2008.
Semi-supervised learning from infinite labeled and unlabeled data sequentially. Extends online learning to cases where most input examples are unlabeled. The key is stochastic gradient descent on any convex semi-supervised risk function, with two practical approximations for manifold regularization: buffering and random projection tree. [pdf]

Xiaojin Zhu and Andrew Goldberg. Kernel regression with order preferences. In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
A linear program to incorporate order preferences ("I think the target value is larger at x1 than at x2") as regularizer in regression. [pdf] [TR 1578 version]

Andrew Goldberg, Xiaojin Zhu, and Stephen Wright. Dissimilarity in graph-based semi-supervised classification. In Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS), 2007.
A convex quadratic program to incorporate cannot-links (two examples should have different labels) into binary and multiclass classification. Extends graph-based semi-supervised learning to mixed graphs. [pdf]

Xiaojin Zhu, Jaz Kandola, John Lafferty, and Zoubin Ghahramani. Graph kernels by spectral transforms. In O. Chapelle, B. Schölkopf, and A. Zien, editors, Semi-Supervised Learning. MIT Press, 2006.
Keep the eigenvectors of a graph Laplacian, but optimize the eigenvalues under the constraints that smoother eigenvectors should have larger eigenvalues, to maximize kernel-target alignment on training data. Extended version of NIPS05 paper. [pdf]

Andrew Goldberg and Xiaojin Zhu. Seeing stars when there aren't many stars: Graph-based semi-supervised learning for sentiment categorization. In HLT-NAACL 2006 Workshop on Textgraphs: Graph-based Algorithms for Natural Language Processing, New York, NY, 2006.
Do people like a movie? We extend the classic Pang&Lee movie sentiment paper to semi-supervised learning by building a graph over labeled and unlabeled movie reviews. [pdf]

Xiaojin Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin, Madison, 2005.
We review the literature on semi-supervised learning, i.e., machine learning from both labeled and unlabeled data. This online paper is updated frequently to incorporate the latest development in the field. [pdf]

Xiaojin Zhu. Semi-Supervised Learning with Graphs. PhD thesis, Carnegie Mellon University, 2005. CMU-LTI-05-192. [pdf]
My Ph.D. thesis on graph-based semi-supervised learning, including label propagation, Gaussian random fields and harmonic functions, semi-supervised active learning, graph hyperparameter learning, kernel matrices from graph Laplacian, sparse representation and so on.

Xiaojin Zhu and John Lafferty. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In The 22nd International Conference on Machine Learning (ICML). ACM Press, 2005.
Making graph-based semi-supervised learning faster and handling unseen data, by first modeling data with a mixture model (e.g., GMM), then treating mixture components (instead of individual data points) as nodes in the graph. [pdf][small teapot data (.mat)]

Maria-Florina Balcan, Avrim Blum, Patrick Pakyan Choi, John Lafferty, Brian Pantano, Mugizi Robert Rwebangira, and Xiaojin Zhu. Person identification in webcam images: An application of semi-supervised learning. In ICML 2005 Workshop on Learning with Partially Classified Training Data, 2005.
Use abundant unlabeled frames to improve people recognition by Webcam. The graph over Webcam image frames uses close-in-time edges, foreground color histogram edges (people with similar apparel), and similar-face edges. [pdf] [FreeFoodCam dataset (.tgz 335MB)]

Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, and John Lafferty. Nonparametric transforms of graph kernels for semi-supervised learning. In Lawrence K. Saul, Yair Weiss, and Léon Bottou, editors, Advances in Neural Information Processing Systems (NIPS) 17. MIT Press, Cambridge, MA, 2005.
Keep the eigenvectors of a graph Laplacian, but optimize the eigenvalues under the constraints that smoother eigenvectors should have larger eigenvalues, to maximize kernel-target alignment on training data. [pdf] [Matlab code & data] [QP notes]

John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel conditional random fields: Representation and clique selection. In The 21st International Conference on Machine Learning (ICML), 2004.
We kernelize Conditional Random Fields, which is an alternative to Maximum Margin Markov Networks. We propose greedy clique selection in the dual for sparse representation. [ps] [pdf]

Xiaojin Zhu, Zoubin Ghahramani, and John Lafferty. Semi-supervised learning using Gaussian fields and harmonic functions. In The 20th International Conference on Machine Learning (ICML), 2003.
A graph-based semi-supervised learning algorithm that creates a graph over labeled and unlabeled examples. More similar examples are connected by edges with higher weights. The intuition is for the labels to propagate on the graph to unlabeled data. The solution can be found with simple matrix operations, and has strong connections to spectral graph theory. [ps.gz] [pdf] [Matlab code] [data]

Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions. In ICML 2003 workshop on The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining, 2003.
Actively selects an unlabeled point to ask for the label, by minimizing an estimated classification error (instead of simply picking the most ambiguous unlabeled point). Once the label is obtained, efficiently retrain the classifier with both labeled and unlabeled data. [ps.gz] [pdf] [Matlab code]

Xiaojin Zhu, John Lafferty, and Zoubin Ghahramani. Semi-supervised learning: From Gaussian fields to Gaussian processes. Technical Report CMU-CS-03-175, Carnegie Mellon University, 2003.
We establish the connection between the inverse graph Laplacian and kernel Gram matrix, and learn hyperparameters for graph weights with evidence maximization. However, this is not a true Gaussian process since unseen points (not in training labeled and unlabeled data) are not handled well. [ps.gz] [pdf]

Xiaojin Zhu and Zoubin Ghahramani. Learning from labeled and unlabeled data with label propagation. Technical Report CMU-CALD-02-107, Carnegie Mellon University, 2002.
Precursor of the ICML03 paper. The intuition of label propagation is introduced, together with an iterative algorithm which amounts to relaxation method. [ps.gz] [pdf]

Xiaojin Zhu and Zoubin Ghahramani. Towards semi-supervised classification with Markov random fields. Technical Report CMU-CALD-02-106, Carnegie Mellon University, 2002.
Yet another precursor of the ICML03 paper. The graph is defined, but as a Boltzmann machines (discrete states) rather than the later Gaussian random fields (continuous states). Inference, with MCMC, is difficult. [ps.gz] [pdf]

Latent Topic Models

David Andrzejewski, Xiaojin Zhu, and Mark Craven. Incorporating domain knowledge into topic modeling via Dirichlet forest priors. In The 26th International Conference on Machine Learning (ICML), 2009.
Allowing Must-links and Cannot-links on words in LDA topics, by replacing the Dirichlet prior with a mixture of Dirichlet trees. [pdf]

David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu. Statistical debugging using latent topic models. In Proceedings of the 18th European Conference on Machine Learning (ECML), 2007.
Representing software execution traces using "bag-of-words", where the words are instrumented probes in the software. A Delta-Latent-Dirichlet-Allocation (ΔLDA) model to identify weak latent topics that correspond to distinct software bugs. [pdf]

Jordan Boyd-Graber, David Blei, and Xiaojin Zhu. A topic model for word sense disambiguation. In Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL), 2007.
[pdf]

Machine Learning for Cognitive Science

Bryan Gibson, Xiaojin Zhu, Tim Rogers, Chuck Kalish, and Joseph Harrison. Humans learn using manifolds, reluctantly. In Advances in Neural Information Processing Systems (NIPS) 24, 2010.
Humans can learn the two-moon dataset, if we give them 4 (but not 2) labeled points and clue them in on the graph. [pdf]

Xiaojin Zhu, Bryan R. Gibson, Kwang-Sung Jun, Timothy T. Rogers, Joseph Harrison, and Chuck Kalish. Cognitive models of test-item effects in human category learning. In The 27th International Conference on Machine Learning (ICML), 2010.
Two people with exactly the same training may classify a test item differently, depending on what other test items they are asked to classify (without label feedback). We explain such Test-Item Effect with online semi-supervised learning, which extends the exemplar, the prototype and the rational models of categorization. [paper pdf]

Xiaojin Zhu, Timothy Rogers, and Bryan Gibson. Human Rademacher Complexity. In Advances in Neural Information Processing Systems (NIPS) 23, 2009.
This student in your class keeps nodding and can recite everything you said. How do you know if he has truly learned the material, or is he simply overfitting your lecture? We offer a measure that combines computational learning theory and cognitive psychology to gauge human generalization abilities.
[paper pdf | the Shape domain images | the Word domain text | task=WordLength example subject file]

Rui Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Timothy Rogers, and Xiaojin Zhu. Human active learning. In Advances in Neural Information Processing Systems (NIPS) 22, 2008.
Can humans perform and benefit from active learning in categorization tasks? We conduct behavioral experiments and compare humans' learning rate to predictions by statistical learning theory. The short answer is Yes. [preprint: pdf]

Xiaojin Zhu, Michael Coen, Shelley Prudom, Ricki Colman, and Joseph Kemnitz. Online learning in monkeys. In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008. (short paper)
We compare rhesus monkeys playing the Wisconsin Card Sorting Task to online machine learning algorithms. [pdf]

Xiaojin Zhu, Timothy Rogers, Ruichen Qian, and Chuck Kalish. Humans perform semi-supervised classification too. In Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
We show that humans determine class boundaries using both labeled and unlabeled data, just like certain semi-supervised machine learning models. [pdf] [data]

Natural Language Processing

Andrew Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson, and Xiaojin Zhu. May all your wishes come true: A study of wishes and how to recognize them. In North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HLT), 2009.
People from around the world offered up their wishes to be printed on confetti and dropped from the sky during the famous New Year's Eve ``ball drop'' in New York City's Times Square. We present an in-depth analysis of this collection of wishes. We then leverage this unique resource to conduct the first study on building general ``wish detectors'' for natural language text. [pdf]

Xiaojin Zhu, Zhiting Xu, and Tushar Khot. How creative is your writing? A linguistic creativity measure from computer science and cognitive psychology perspectives. In NAACL 2009 Workshop on Computational Approaches to Linguistic Creativity, 2009.
Predict creativity of text using linear regression with features extracted from Google 1T 5gram corpus, WordNet, and Leuven word norms. [pdf] [data]

Xiaojin Zhu, Andrew B. Goldberg, Michael Rabbat, and Robert Nowak. Learning bigrams from unigrams. In The 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL), 2008.
If I give you a text document in bag-of-word (unigram count vector) format, you will not know the order between words. What if I give you 10,000 documents, each in bag-of-word format? Surprisingly, we can partially recover a bigram language model just from these bag-of-word documents. [pdf]

Xiaojin Zhu, Andrew Goldberg, Jurgen Van Gael, and David Andrzejewski. Improving diversity in ranking using absorbing random walks. In Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), 2007.
A ranking algorithm (GRASSHOPPER) that is similar to PageRank but encourages diversity in top ranked items, by turning already ranked items into absorbing states to penalize remaining similar items. [pdf] [code]

Jurgen Van Gael and Xiaojin Zhu. Correlation clustering for crosslingual link detection. In International Joint Conference on Artificial Intelligence (IJCAI), 2007.
Cluster news articles in different languages by event. A practical implementation of correlation clustering that involves linear program chunking. [pdf][data]

Gregory Druck, Chris Pal, Xiaojin Zhu, and Andrew McCallum. Semi-supervised classification with hybrid generative/discriminative methods. In The Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2007.
[pdf]

Jordan Boyd-Graber, David Blei, and Xiaojin Zhu. A topic model for word sense disambiguation. In Conference on Empirical Methods in Natural Language Processing (EMNLP-CoNLL), 2007.
[pdf]

SaiSuresh Krishnakumaran and Xiaojin Zhu. Hunting elusive metaphors using lexical resources. In NAACL 2007 Workshop on Computational Approaches to Figurative Language, 2007.
Identify "The soldier is a lion" as a metaphor by noting the lack of WordNet hyponym relationship between "soldier" and "lion". Extends to verb-noun or adjective-noun pairs using Google Web 1T bigram counts. [pdf] [data]

Ronald Rosenfeld, Stanley Chen, and Xiaojin Zhu. Whole-sentence exponential language models: a vehicle for linguistic-statistical integration. Computers Speech and Language, 15(1), 2001.
Directly model the probability of a sentence with an exponential model, instead of using the chain rule on words. Can use arbitrary, long range features. [pdf]

Xiaojin Zhu and Ronald Rosenfeld. Improving trigram language modeling with the World Wide Web. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2001.
Estimating n-gram probabilities by submitting word sequences as phrase queries to search engines. [pdf] [tech report version CMU-CS-00-171 ps]

Xiaojin Zhu, Stanley F. Chen, and Ronald Rosenfeld. Linguistic features for whole sentence maximum entropy language models. In Proceedings of the 5th European Conference on Speech Communication and Technology (Eurospeech), 1999.
Parse a real corpus and a trigram-generated corpus using a shallow parser. Identify features that behave differently in the two corpora. Use them to build a better language model. [ps]

Human Computer Interfaces

Arthur Glenberg, Andrew B. Goldberg, and Xiaojin Zhu. Improving early reading comprehension using embodied CAI. Instructional Science, 2009.
[link]

Andrew B. Goldberg, Jake Rosin, Xiaojin Zhu, and Charles R. Dyer. Toward Text-to-Picture Synthesis. In NIPS 2009 Symposium on Assistive Machine Learning for People with Disabilities, 2009.
[pdf]

Andrew B. Goldberg, Xiaojin Zhu, Charles R. Dyer, Mohamed Eldawy, and Lijie Heng. Easy as ABC? Facilitating pictorial communication via semantically enhanced layout. In Twelfth Conference on Computational Natural Language Learning (CoNLL), 2008.
If you have pictures for individual words in a sentence, how do you compose them to best convey the meaning of the sentence? We learn an "ABC" layout using semantic role labeling and conditional random fields, and conduct a user study. [pdf]

Xiaojin Zhu, Andrew Goldberg, Mohamed Eldawy, Charles Dyer, and Bradley Strock. A text-to-picture synthesis system for augmenting communication. In The Integrated Intelligence Track of the Twenty-Second AAAI Conference on Artificial Intelligence (AAAI-07), 2007.
Synthesizing a picture from general, unrestricted natural language text, to convey the gist of the text. [pdf]

Stefanie Shriver, Arthur Toth, Xiaojin Zhu, Alex Rudnicky, and Roni Rosenfeld. A unified design for human-machine voice interaction. In Human Factors in Computing Systems (CHI). ACM Press, 2001.
In order for humans to use speech interfaces, they might need to learn how to speak to machines. [ps]

Ronald Rosenfeld, Xiaojin Zhu, Stefanie Shriver, Arthur Toth, Kevin Lenzo, and Alan Black. Towards a universal speech interface. In International Conference on Spoken Language Processing (ICSLP), 2000.
A general speech input paradigm that attempts to structurize human speech to facilitate speech recognition. [pdf]

Xiaojin Zhu, Jie Yang, and Alex Waibel. Segmenting hands of arbitrary color. In Fourth IEEE International Conference on Automatic Face and Gesture Recognition, 2000.
We model the color histogram of a scene by a Gaussian mixture model, one of the mixture component is the hand. [ps.gz]

Jie Yang, Xiaojin Zhu, Ralph Gross, John Kominek, Yue Pan, and Alex Waibel. Multimodal people ID for multimedia meeting browser. In The Seventh ACM International Multimedia Conference, 1999.
Use face recognition, speaker identification, color histogram, and sound direction to identify meeting participants. [link]

Applications of Statistical Machine Learning

Nathan Rosenblum, Barton Miller, and Xiaojin Zhu. Extracting compiler provenance from program binaries. In Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program Analysis for Software Tools and Engineering (PASTE), 2010. [paper pdf]

Nathan Rosenblum, Xiaojin Zhu, Barton Miller, and Karen Hunt. Learning to analyze binary computer code. In Twenty-Third AAAI Conference on Artificial Intelligence (AAAI-08), 2008.
An extended version of the NIPS07 workshop paper, including high throughput computing and a formal analysis of self-repairing disassembly. [pdf]

Nathan Rosenblum, Xiaojin Zhu, Barton Miller, and Karen Hunt. Machine Learning-Assisted Binary Code Analysis. In NIPS workshop on Machine Learning in Adversarial Environments for Computer Security, 2007.
Identify function entry points in binary code using Markov Random Fields on both local instruction patterns and global control flow structures .

David Andrzejewski, Anne Mulhern, Ben Liblit, and Xiaojin Zhu. Statistical debugging using latent topic models. In Proceedings of the 18th European Conference on Machine Learning (ECML), 2007.
Representing software execution traces using "bag-of-words", where the words are instrumented probes in the software. A Delta-Latent-Dirichlet-Allocation (ΔLDA) model to identify weak latent topics that correspond to distinct software bugs. [pdf]

Mariyam Mirza, Joel Sommers, Paul Barford, and Xiaojin Zhu. A machine learning approach to TCP throughput prediction. In The International Conference on Measurement and Modeling of Computer Systems (ACM SIGMETRICS), 2007.
Apply Support Vector Regression to predict Internet file transfer rate from measurable features of the network. [pdf]

[back to Xiaojin Zhu's homepage]