CS 769: Advanced Natural Language Processing
Project
A good project applies one or more methods discussed in class to an NLP (or your own research) problem.
The amount of work is expected to be equal to 3 to 5 homeworks.
An excellent project addresses an issue with social impact, and/or is creative.
Most projects should be individual, group projects are allowed with instructor approval.
What we want:
1. A short (1-2 paragraphs) proposal (3/24)
2. Project update: short chat in my office (in April)
3. A project report, 2-page extended abstract style (May 12).
Please use the Word or Latex template for AAAI, available at
http://www.aaai.org/Publications/Author/author.php.
The length limit is 2 pages, plus 1 additional page for references.
4. A poster session (May 9 1-3pm)
2008 projects
2007 projects
2006 projects
ACL wiki, in particular the "Resources" tab.
Recent conferences: ACL 2007, EMNLP 2007, NAACL 2007, KDD 2007
A few possible projects (You are strongly encouraged to propose your own project ideas instead):
* Do something interesting to a non-conventional text collection, including
software bug reports, network logs, conference proceedings in your area, etc.
* Do something interesting to the cs.wisc.edu dataset (available here).
* Do something interesting to the Google 1T 5-gram data
* Present News on Google Maps (Location extraction. Summarization. Visualization. Clutter avoidance)
* Apply one technique (language modeling, classification, clustering, latent topic modeling, sequence
modeling, etc.) to one of the data collection in ACL wiki
Instructor:
Xiaojin (Jerry) Zhu
E-mail: jerryzhu@cs.wisc.edu
Web: http://pages.cs.wisc.edu/~jerryzhu
Phone: 608 890 0129
6391 Computer Science
Office Hours: Mondays 4-5pm, CS6391. Email me for additional appointments.
Class:
Spring 2008
Time/Place: 1:00-2:15pm Monday, Wednesday and Friday / 1207 Computer Science
TA: Chris Hinrichs, hinrichs@cs.wisc.edu. TA Office hours: Fridays 10--12am, CS5390
Class mailing list: compsci769-1-s08@lists.wisc.edu, archive
Grading and Evaluation:
Grades will be based on course participation (required), 5~10 homeworks (50%), and a project (50%).
Lectures:
Crib sheet
Notes 0
Language as a stochastic process
Notes 1
1/23 Wheel of Fortune (random variable, probability, estimation, likelihood, multinomial, MLE)
1/25 Assistive input device for the paralyzed (conditional probability, Bayes rule)
1/28 Zipf's law (the heavy tail of Miller's monkey)
Language Models
Notes 2 1/30, 2/1: YKWIM--teen chat decoder (language models, n-gram, MLE vs. MAP vs. Bayesian, Dirichlet, perplexity)
Notes 3 2/4, 2/6: The entropy of English (entropy, mutual information, KL divergence)
Text classification
Notes 4 2/8: Gracias, shukriya, xiexie: Language identification (Naive Bayes, graphical model, cross validation)
Notes 5 2/11, 2/13: Ha Ha Ha: Computational humor (logistic regression, regularization)
Notes 6 2/15, 2/18, 2/20: Dig it? Sentiment analysis (support vector machines)
Text summarization
2/22 Guest lecture: Andrew Goldberg [ notes | slides ]
Paired t-test
Notes 7 2/25, 2/27
Text clustering
Notes 8 2/29, 3/3 hierarchical clustering, K-means clustering, Spectral clustering
Information retrieval
Notes 9 3/5, 3/7 tf.idf, precision recall, link analysis (Google PageRank, Hubs and Authorities)
Semi-supervised learning
Notes 10 3/10, 3/12 The EM algorithm, mixture models, word sense disambiguation
Latent topic models
Notes 11 3/14, 3/24 Latent Semantic Indexing, Latent Dirichlet Allocation
Sequence modeling
Notes 12 3/26 Hidden Markov Models
Notes 13 3/28, 3/31 Inference in graphical models (factor graph, sum-product/belief propagation, max-product)
Notes 14 4/2 Conditional Random Fields
Parsing
4/4 Probabilistic Context Free Grammar
Discussions
4/7 Machine translation, speech recognition, text-to-speech, text-to-picture
Books and References:
A textbook will not be followed in this course. A collection of notes,
relevant papers and materials will be prepared and distributed. Recommended
further readings are listed here.
Prerequisites:
CS 540 or equivalent, or instructor consent
This class as a driving school.