CS 769: Advanced Natural Language Processing


Project 
	
	A good project applies one or more methods discussed in class to an NLP (or your own research) problem.
	The amount of work is expected to be equal to 3 to 5 homeworks.
	An excellent project addresses an issue with social impact, and/or is creative.
	Most projects should be individual, group projects are allowed with instructor approval.

	What we want:
	1. A short (1-2 paragraphs) proposal (3/24)
	2. Project update: short chat in my office (in April)
	3. A project report, 2-page extended abstract style (May 12). 
	   Please use the Word or Latex template for AAAI, available at
	   http://www.aaai.org/Publications/Author/author.php.  
	   The length limit is 2 pages, plus 1 additional page for references.
	4. A poster session (May 9 1-3pm)

	2008 projects
	2007 projects
	2006 projects
  	ACL wiki, in particular the "Resources" tab.
	Recent conferences: ACL 2007, EMNLP 2007, NAACL 2007, KDD 2007

	A few possible projects (You are strongly encouraged to propose your own project ideas instead):
	* Do something interesting to a non-conventional text collection, including 
	  software bug reports, network logs, conference proceedings in your area, etc.
	* Do something interesting to the cs.wisc.edu dataset (available here).
	* Do something interesting to the Google 1T 5-gram data
	* Present News on Google Maps (Location extraction.  Summarization. Visualization.  Clutter avoidance)
	* Apply one technique (language modeling, classification, clustering, latent topic modeling, sequence
  	  modeling, etc.) to one of the data collection in ACL wiki

Instructor:

	Xiaojin (Jerry) Zhu
	E-mail: jerryzhu@cs.wisc.edu
	Web: http://pages.cs.wisc.edu/~jerryzhu
	Phone:  608 890 0129 
	6391 Computer Science
	Office Hours: Mondays 4-5pm, CS6391.  Email me for additional appointments.
	
Class: 

	Spring 2008
	Time/Place: 1:00-2:15pm Monday, Wednesday and Friday / 1207 Computer Science
	TA: Chris Hinrichs, hinrichs@cs.wisc.edu.  TA Office hours: Fridays 10--12am, CS5390
	Class mailing list: compsci769-1-s08@lists.wisc.edu, archive
	
Grading and Evaluation:

	Grades will be based on course participation (required), 5~10 homeworks (50%), and a project (50%).
	
Lectures: 

   Crib sheet
        Notes 0
   Language as a stochastic process
        Notes 1
   	1/23 Wheel of Fortune (random variable, probability, estimation, likelihood, multinomial, MLE)
	1/25 Assistive input device for the paralyzed (conditional probability, Bayes rule)
   	1/28 Zipf's law (the heavy tail of Miller's monkey)
   Language Models
   	Notes 2 1/30, 2/1: YKWIM--teen chat decoder (language models, n-gram, MLE vs. MAP vs. Bayesian, Dirichlet, perplexity)
	Notes 3 2/4, 2/6: The entropy of English (entropy, mutual information, KL divergence)
   Text classification
   	Notes 4 2/8: Gracias, shukriya, xiexie: Language identification (Naive Bayes, graphical model, cross validation)
	Notes 5 2/11, 2/13: Ha Ha Ha: Computational humor (logistic regression, regularization)
	Notes 6 2/15, 2/18, 2/20: Dig it? Sentiment analysis (support vector machines)
   Text summarization
   	2/22 Guest lecture: Andrew Goldberg [ notes | slides ]
   Paired t-test
	Notes 7 2/25, 2/27 
   Text clustering
   	Notes 8 2/29, 3/3 hierarchical clustering, K-means clustering, Spectral clustering
   Information retrieval
   	Notes 9 3/5, 3/7 tf.idf, precision recall, link analysis (Google PageRank, Hubs and Authorities)
   Semi-supervised learning
   	Notes 10 3/10, 3/12 The EM algorithm, mixture models, word sense disambiguation
   Latent topic models
   	Notes 11 3/14, 3/24 Latent Semantic Indexing, Latent Dirichlet Allocation
   Sequence modeling
   	Notes 12 3/26 Hidden Markov Models
   	Notes 13 3/28, 3/31 Inference in graphical models (factor graph, sum-product/belief propagation, max-product)
   	Notes 14 4/2 Conditional Random Fields
   Parsing
   	4/4 Probabilistic Context Free Grammar
   Discussions
   	4/7 Machine translation, speech recognition, text-to-speech, text-to-picture

Books and References:

A textbook will not be followed in this course.  A collection of notes, 
relevant papers and materials will be prepared and distributed.  Recommended 
further readings are listed here.

Prerequisites: 

	CS 540 or equivalent, or instructor consent
	
This class as a driving school.