CS 838-1: Advanced Natural Language Processing

Spring 2007

This course has two themes: applications in natural language processing, and statistical machine learning methods.   The applications include text categorization, document summarization, sentiment analysis, word sense disambiguation, machine translating, speech recognition, while techniques include basic information theory and probabilistic modeling, Expectation-Maximization, Support Vector Machines, probabilistic Context Free Grammars, Hidden Markov Models, Conditional Random Fields, latent Dirichlet allocation, graphical models (Markov Chain Monte Carlo and variational inference), link analysis, and semi-supervised learning.  The learning methods are also applicable to bioinformatics, computer vision, computer code analysis, and other fields. This course counts as core AI credit.

Schedule

Lecture: 11:00am-12:15pm TR, CS 1325
Office hour: Thursday 3pm-4pm, CS 4369
Class mailing list: compsci838-1-s07@lists.wisc.edu (archive)
Email the instructor: jerryzhu@cs.wisc.edu
Teaching assistant: Chi-Man Liu, cx@cs.wisc.edu

Instructor: Xiaojin (Jerry) Zhu

Please feel free to send me email, I usually respond quickly.

Course Outline and Readings

The order and exact content are subject to change.

Lecture notes

(the week of)
Jan 23 mathematical background
Jan 30 words, zipf's law, miller's monkey
Feb 6 language modeling
Feb 13 information theory, information retrieval
Feb 20 link analysis
Feb 27 naive Bayes, logistic regression
March 6 the EM algorithm
March 20 SVMs, text summarization (Andrew Goldberg)
March 27 latent topic models
April 3 spring break, no class
April 10 hidden Markov models
April 17, 24 inference in graphical models
May 1 conditional random fields

References

Books

Courses

Grading

Quotes

"I'm right now working at Google in the Search Quality team.  I see lot of concepts we covered in CS838 being used here. Thanks again for offering the course." -- former student

"The best part of the course was the classes. Teaching was very good. Even the very difficult concepts appeared simple." -- former student

"Dude, it is big... but lovely." -- word samples from a unigram language model trained on movie reviews (punctuations added)