CS 838-1: Advanced Natural Language Processing
Spring 2007
This course has two themes: applications in natural language processing,
and statistical machine learning methods. The applications
include text categorization, document summarization, sentiment analysis,
word sense disambiguation, machine translating, speech recognition, while
techniques include basic information theory and probabilistic modeling,
Expectation-Maximization, Support Vector Machines, probabilistic Context
Free Grammars, Hidden Markov Models, Conditional Random Fields, latent
Dirichlet allocation, graphical models (Markov Chain Monte Carlo and variational
inference), link analysis, and semi-supervised learning. The learning
methods are also applicable to bioinformatics, computer vision, computer
code analysis, and other fields. This course counts as core AI credit.
Schedule
Lecture: |
11:00am-12:15pm TR, CS 1325 |
Office hour: |
Thursday 3pm-4pm, CS 4369 |
Class mailing list: |
compsci838-1-s07@lists.wisc.edu (archive) |
Email the instructor: |
jerryzhu@cs.wisc.edu |
Teaching assistant: |
Chi-Man Liu, cx@cs.wisc.edu |
Please feel free to send me email, I usually respond quickly.
Course Outline and Readings
The order and exact content are subject to change.
-
Review of mathematical background
-
[cB] (see 'Books' below) 1.2, Appendix B, C, E
-
[dM] 2 or [MS] 2.1
-
Iain Murray's excellent crib
sheet.
-
Statistics of the English language
-
[MS] 4.2, 1.4.2, 1.4.3
-
Zipf's
law, literature
-
Wentian Li, Comments
to "Bell Curves and Monkey Languages", 1996
-
Wentian Li. Random texts exhibit Zipf's-law-like word frequency distribution.
IEEE Transactions on Information Theory, 38(6), 1842-1845, 1992
-
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical
learning by 8-month-old infants. Science, 274, 1926-1928.
-
Lillian Lee. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001.
Computer Science: Reflections on the Field, Reflections from the Field, pp. 111--118, 2004.
-
Language modeling, smoothing
-
The entropy of a language, information theory
-
Information retrieval and link analysis
-
John Lafferty and Chengxiang Zhai, Probabilistic
relevance models based on document and query generation, In Language
Modeling and Information Retrieval, Kluwer International Series on Information
Retrieval, Vol. 13, 2003.
-
ChengXiang Zhai, John Lafferty, A
study of smoothing methods for language models applied to information retrieval,
ACM Transactions on Information Systems, Vol. 2, No. 2, April 2004.
-
[MS] 15
-
the Lemur toolkit
-
The PageRank Citation
Ranking: Bringing Order to the Web. Lawrence Page and Sergey Brin and
Rajeev Motwani and Terry Winograd. Stanford Digital Library Technologies
Project. 1998
-
Jon M. Kleinberg. Authoritative
sources in a hyperlinked environment. Journal of the ACM, 46(5),
604--632, 1999
- Document summarization
- P. Turney. Learning to extract keyphrases from text. Technical report, National Research Council, Institute for Information Technology, 1999.
- A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proc. Conf. Empirical Methods in Natural Language Processing, 2003.
- R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. Conf. Empirical Methods in Natural Language Processing, 2004.
- G. Erkan and D. Radev. 2004. LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research.
- X. Zhu, A. Goldberg, J. Van Gael and D. Andrzejewski.
Improving Diversity in Ranking using Absorbing Random Walks. NAACL-HLT, 2007.
-
Text categorization: Naive Bayes, logistic regression
- [cB] 8.1, 8.2 for Naive Bayes; 4.3 for logistic regression.
-
A
Comparison of Event Models for Naive Bayes Text Classification. Andrew
McCallum and Kamal Nigam. AAAI-98 Workshop on "Learning for Text Categorization".
-
Andrew McCallum's rainbow
statistical text classification code
-
Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, 1996. A
maximum entropy approach to natural language processing . Computational
Linguistics 22(1).
-
Ronald Rosenfeld. A
Maximum Entropy Approach to Adaptive Statistical Language Modeling.
Computer, Speech and Language 10, 187--228, 1996
-
Stanley Chen and Ronald Rosenfeld. Efficient
Sampling and Feature Selection in Whole Sentence Maximum Entropy Language
Models. In Proc. ICASSP '99, Phoenix, Arizona, March 1999.
-
Zhang Le's MaxEnt
page
-
Y. Dan Rubenstein and Trevor Hastie, 1997. Discriminative
vs Informative Learning. Proc. of KDD.
-
Andrew Y. Ng and Michael Jordan, 2002. On
discriminative vs. generative classifiers: A comparison of logistic regression
and Naive Bayes. Proc. of NIPS.
-
Sentiment, humor, gender analysis with Support Vector Machines
- [cB] 7.1
- Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Thumbs up? Sentiment Classification using Machine Learning Techniques.
EMNLP, 2002.
- Rada Mihalcea and Carlo Strapparava.
Making Computers Laugh: Investigations in Automatic Humor Recognition.
EMNLP, 2005.
- Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni.
Automatically Categorizing Written Texts by Author Gender.
Literary and Linguistic Computing 17(4), November 2002, pp. 401-412.
-
Chris Burges. A
Tutorial on Support Vector Machines for Pattern Recognition. Knowledge
Discovery and Data Mining, 2(2), 1998.
-
Alex J. Smola and Bernhard Scholkopf. A
Tutorial on Support Vector Regression, NeuroCOLT Technical Report TR-98-030.
1998
-
Thorsten Joachims' SVM-light code
-
Word sense disambiguation: unlabeled text as knowledge source
- [cB] 9 for the EM algorithm.
-
Self-training for word sense disambiguation: David Yarowsky, 1995. Unsupervised
word sense disambiguation rivaling supervised methods, Proceedings
of the 33rd Annual Meeting of the Association for Computational Linguistics,
pp 189--196.
-
Text
Classification from Labeled and Unlabeled Documents using EM. Kamal
Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning,
39(2/3). pp. 103-134. 2000.
-
Combining Labeled
and Unlabeled Data with Co-Training. Avrim Blum and Tom Mitchell. Proceedings
of the 11th Annual Conference on Computational Learning Theory, pages 92--100,
1998
-
T. Joachims, Transductive
Inference for Text Classification using Support Vector Machines. Proceedings
of the International Conference on Machine Learning (ICML), 1999.
-
Semi-Supervised
Learning Using Gaussian Fields and Harmonic Functions. Xiaojin
Zhu, Zoubin Ghahramani, John Lafferty. The Twentieth International
Conference on Machine Learning (ICML-2003)
-
Semi-Supervised
Learning Literature Survey. Xiaojin Zhu, Computer Sciences TR 1530,
University of Wisconsin - Madison.
-
Semantic space via probabilistic Latent Semantic Analysis, latent Dirichlet
allocation
-
Part of Speech tagging with Hidden Markov Models
-
Information extraction with Conditional Random Fields
-
Parsing and context free grammars
-
Machine Translation
-
Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della
Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry
Printz, and Lubos Ures, 1994. The
Candide System for Machine Translation. Proceedings of the 1994 ARPA
Workshop on Human Language Technology
-
Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert
L. Mercer, 1993. The
Mathematics of Statistical Machine Translation. Computational Linguistics
19(2), pp. 263--311.
-
Papineni, Roukos, Ward, Zhu. Bleu:
a Method for Automatic Evaluation of Machine Translation. Proceedings
of the 40th Annual Meeting of the Association for Computational Linguistics
(ACL), Philadelphia, July 2002, pp. 311-318.
Lecture notes
References
Books
-
Text book: [cB] Christopher M. Bishop, Pattern
Recognition and Machine Learning. Springer Verlag, 2006.
-
[MS] Manning & Schutze, Foundations
of statistical natural language processing, the MIT press, 1999.
-
[JM] Jurafsky & Martin, Speech
and language processing, Prentice Hall, 2000.
-
[dM] David MacKay. Information
Theory, Inference, and Learning Algorithms. Cambridge University Press,
2002.
Courses
-
language and statistics,
Rosenfeld, CMU
-
human language
technologies, Callan, Black, Lavie, CMU
-
algorithms
for NLP, Lavie, Frederking, CMU
-
speech recognition and understanding,
Schultz, Waibel, CMU
-
information
retrieval, Callan, Yang, CMU
-
text data mining,
Callan, CMU
-
learning to turn words
into data, Cohen, CMU
-
machine
learning for text analysis, Craven, Shavlik, U Wisconsin
-
introduction
to natural language processing, McCallum, U Mass
-
statistical methods
for artificial intelligence, McAllester, TTI-C
-
statistical
natural language processing: models and methods, Lee, Cornell
-
natural language
processing, Lee, Cornell
-
natural language
processing, Cardie, Cornell
-
statistical foundations of machine
learning, Lafferty, Wasserman, CMU
-
Empirical Methods
in Natural Language Processing, Koehn, Edinburgh
-
Natural Language Processing, Mihalcea, University of North Texas
-
Topics in Natural Language Processing, Ringger, BYU
-
introduction to bioinformatics,
Craven, U Wisconsin
-
advanced bioinformatics,
Craven, U Wisconsin
-
machine learning,
Shavlik, U Wisconsin
-
advanced methods in artificial
intelligence, Page, U Wisconsin
Grading
-
Homework 35%.
-
Exam 30%.
-
Project 30%.
This year we will allow several types of projects:
-
Completely open. If you have an idea on an interesting project, please
send me email / talk to me. Last year's projects can be found here for your
reference.
- Semi-open. Consider the Web contents of all users in our department.
These can be easily obtained (if you have an CS account) in each user's
~/public/html directory. Do something interesting and creative to it.
-
Focused. I will proposal a few concrete project ideas:
- Given a researcher's name (e.g., Michael Jordan), find the person's Web
page, build a statistical profile for the person (e.g., a unigram language
model from the researcher's publication pdf files). With many profiles
from different researchers, we can ask questions like "who is best to ask
for this question", "who can review this paper", etc.
- Paper search. Given a PDF paper file, identify near-identical papers in a collection of PDF files or online.
- Satire recognition. For example, distinguish articles from The Onion vs. CNN.
It is OK to work in groups. But each group member should make clear his/her distinct intellectual contribution.
-
"Quality" class participation 5%.
Quotes
"I'm right now working at Google in the Search Quality team. I see
lot of concepts we covered in CS838 being used here. Thanks again for offering
the course."
-- former student
"The best part of the course was the classes. Teaching was very good. Even the very difficult concepts appeared simple."
-- former student
"Dude, it is big... but lovely."
-- word samples from a unigram language model trained on movie reviews (punctuations added)