CS 838-1: Advanced Natural Language Processing
This course has two themes: applications in natural language processing,
and statistical machine learning methods. The applications
include text categorization, document summarization, sentiment analysis,
word sense disambiguation, machine translating, speech recognition, while
techniques include basic information theory and probabilistic modeling,
Expectation-Maximization, Support Vector Machines, probabilistic Context
Free Grammars, Hidden Markov Models, Conditional Random Fields, latent
Dirichlet allocation, graphical models (Markov Chain Monte Carlo and variational
inference), link analysis, and semi-supervised learning. The learning
methods are also applicable to bioinformatics, computer vision, computer
code analysis, and other fields. This course counts as core AI credit.
Please feel free to send me email, I usually respond quickly.
||11:00am-12:15pm TR, CS 1325
||Thursday 3pm-4pm, CS 4369
|Class mailing list:
|Email the instructor:
||Chi-Man Liu, email@example.com
Course Outline and Readings
The order and exact content are subject to change.
Review of mathematical background
[cB] (see 'Books' below) 1.2, Appendix B, C, E
[dM] 2 or [MS] 2.1
Iain Murray's excellent crib
Statistics of the English language
[MS] 4.2, 1.4.2, 1.4.3
Wentian Li, Comments
to "Bell Curves and Monkey Languages", 1996
Wentian Li. Random texts exhibit Zipf's-law-like word frequency distribution.
IEEE Transactions on Information Theory, 38(6), 1842-1845, 1992
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical
learning by 8-month-old infants. Science, 274, 1926-1928.
Lillian Lee. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001.
Computer Science: Reflections on the Field, Reflections from the Field, pp. 111--118, 2004.
Language modeling, smoothing
The entropy of a language, information theory
Information retrieval and link analysis
John Lafferty and Chengxiang Zhai, Probabilistic
relevance models based on document and query generation, In Language
Modeling and Information Retrieval, Kluwer International Series on Information
Retrieval, Vol. 13, 2003.
ChengXiang Zhai, John Lafferty, A
study of smoothing methods for language models applied to information retrieval,
ACM Transactions on Information Systems, Vol. 2, No. 2, April 2004.
the Lemur toolkit
The PageRank Citation
Ranking: Bringing Order to the Web. Lawrence Page and Sergey Brin and
Rajeev Motwani and Terry Winograd. Stanford Digital Library Technologies
Jon M. Kleinberg. Authoritative
sources in a hyperlinked environment. Journal of the ACM, 46(5),
- Document summarization
- P. Turney. Learning to extract keyphrases from text. Technical report, National Research Council, Institute for Information Technology, 1999.
- A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proc. Conf. Empirical Methods in Natural Language Processing, 2003.
- R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. Conf. Empirical Methods in Natural Language Processing, 2004.
- G. Erkan and D. Radev. 2004. LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research.
- X. Zhu, A. Goldberg, J. Van Gael and D. Andrzejewski.
Improving Diversity in Ranking using Absorbing Random Walks. NAACL-HLT, 2007.
Text categorization: Naive Bayes, logistic regression
- [cB] 8.1, 8.2 for Naive Bayes; 4.3 for logistic regression.
Comparison of Event Models for Naive Bayes Text Classification. Andrew
McCallum and Kamal Nigam. AAAI-98 Workshop on "Learning for Text Categorization".
Andrew McCallum's rainbow
statistical text classification code
Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, 1996. A
maximum entropy approach to natural language processing . Computational
Ronald Rosenfeld. A
Maximum Entropy Approach to Adaptive Statistical Language Modeling.
Computer, Speech and Language 10, 187--228, 1996
Stanley Chen and Ronald Rosenfeld. Efficient
Sampling and Feature Selection in Whole Sentence Maximum Entropy Language
Models. In Proc. ICASSP '99, Phoenix, Arizona, March 1999.
Zhang Le's MaxEnt
Y. Dan Rubenstein and Trevor Hastie, 1997. Discriminative
vs Informative Learning. Proc. of KDD.
Andrew Y. Ng and Michael Jordan, 2002. On
discriminative vs. generative classifiers: A comparison of logistic regression
and Naive Bayes. Proc. of NIPS.
Sentiment, humor, gender analysis with Support Vector Machines
- [cB] 7.1
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Thumbs up? Sentiment Classification using Machine Learning Techniques.
- Rada Mihalcea and Carlo Strapparava.
Making Computers Laugh: Investigations in Automatic Humor Recognition.
- Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni.
Automatically Categorizing Written Texts by Author Gender.
Literary and Linguistic Computing 17(4), November 2002, pp. 401-412.
Chris Burges. A
Tutorial on Support Vector Machines for Pattern Recognition. Knowledge
Discovery and Data Mining, 2(2), 1998.
Alex J. Smola and Bernhard Scholkopf. A
Tutorial on Support Vector Regression, NeuroCOLT Technical Report TR-98-030.
Thorsten Joachims' SVM-light code
Word sense disambiguation: unlabeled text as knowledge source
- [cB] 9 for the EM algorithm.
Self-training for word sense disambiguation: David Yarowsky, 1995. Unsupervised
word sense disambiguation rivaling supervised methods, Proceedings
of the 33rd Annual Meeting of the Association for Computational Linguistics,
Classification from Labeled and Unlabeled Documents using EM. Kamal
Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning,
39(2/3). pp. 103-134. 2000.
and Unlabeled Data with Co-Training. Avrim Blum and Tom Mitchell. Proceedings
of the 11th Annual Conference on Computational Learning Theory, pages 92--100,
T. Joachims, Transductive
Inference for Text Classification using Support Vector Machines. Proceedings
of the International Conference on Machine Learning (ICML), 1999.
Learning Using Gaussian Fields and Harmonic Functions. Xiaojin
Zhu, Zoubin Ghahramani, John Lafferty. The Twentieth International
Conference on Machine Learning (ICML-2003)
Learning Literature Survey. Xiaojin Zhu, Computer Sciences TR 1530,
University of Wisconsin - Madison.
Semantic space via probabilistic Latent Semantic Analysis, latent Dirichlet
Part of Speech tagging with Hidden Markov Models
Information extraction with Conditional Random Fields
Parsing and context free grammars
Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della
Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry
Printz, and Lubos Ures, 1994. The
Candide System for Machine Translation. Proceedings of the 1994 ARPA
Workshop on Human Language Technology
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert
L. Mercer, 1993. The
Mathematics of Statistical Machine Translation. Computational Linguistics
19(2), pp. 263--311.
Papineni, Roukos, Ward, Zhu. Bleu:
a Method for Automatic Evaluation of Machine Translation. Proceedings
of the 40th Annual Meeting of the Association for Computational Linguistics
(ACL), Philadelphia, July 2002, pp. 311-318.
Text book: [cB] Christopher M. Bishop, Pattern
Recognition and Machine Learning. Springer Verlag, 2006.
[MS] Manning & Schutze, Foundations
of statistical natural language processing, the MIT press, 1999.
[JM] Jurafsky & Martin, Speech
and language processing, Prentice Hall, 2000.
[dM] David MacKay. Information
Theory, Inference, and Learning Algorithms. Cambridge University Press,
language and statistics,
technologies, Callan, Black, Lavie, CMU
for NLP, Lavie, Frederking, CMU
speech recognition and understanding,
Schultz, Waibel, CMU
retrieval, Callan, Yang, CMU
text data mining,
learning to turn words
into data, Cohen, CMU
learning for text analysis, Craven, Shavlik, U Wisconsin
to natural language processing, McCallum, U Mass
for artificial intelligence, McAllester, TTI-C
natural language processing: models and methods, Lee, Cornell
processing, Lee, Cornell
processing, Cardie, Cornell
statistical foundations of machine
learning, Lafferty, Wasserman, CMU
in Natural Language Processing, Koehn, Edinburgh
Natural Language Processing, Mihalcea, University of North Texas
Topics in Natural Language Processing, Ringger, BYU
introduction to bioinformatics,
Craven, U Wisconsin
Craven, U Wisconsin
Shavlik, U Wisconsin
advanced methods in artificial
intelligence, Page, U Wisconsin
This year we will allow several types of projects:
It is OK to work in groups. But each group member should make clear his/her distinct intellectual contribution.
Completely open. If you have an idea on an interesting project, please
send me email / talk to me. Last year's projects can be found here for your
- Semi-open. Consider the Web contents of all users in our department.
These can be easily obtained (if you have an CS account) in each user's
~/public/html directory. Do something interesting and creative to it.
Focused. I will proposal a few concrete project ideas:
- Given a researcher's name (e.g., Michael Jordan), find the person's Web
page, build a statistical profile for the person (e.g., a unigram language
model from the researcher's publication pdf files). With many profiles
from different researchers, we can ask questions like "who is best to ask
for this question", "who can review this paper", etc.
- Paper search. Given a PDF paper file, identify near-identical papers in a collection of PDF files or online.
- Satire recognition. For example, distinguish articles from The Onion vs. CNN.
"Quality" class participation 5%.
"I'm right now working at Google in the Search Quality team. I see
lot of concepts we covered in CS838 being used here. Thanks again for offering
-- former student
"The best part of the course was the classes. Teaching was very good. Even the very difficult concepts appeared simple."
-- former student
"Dude, it is big... but lovely."
-- word samples from a unigram language model trained on movie reviews (punctuations added)