Books and References for CS 769: Matlab tutorials A Very Elementary MATLAB Tutorial from Mathworks Reference books [cB] Christopher M. Bishop, Pattern Recognition and Machine Learning. Springer Verlag, 2006. [MS] Manning & Schutze, Foundations of statistical natural language processing, the MIT press, 1999. [JM] Jurafsky & Martin, Speech and language processing, Prentice Hall, 2000. [HTF] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, 2009. Available online. [dM] David MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2002. Mathematical background [cB] 1.2, Appendix B, C, E [dM] 2 or [MS] 2.1 Iain Murray's crib sheet. Sam Roweis' matrix identities. Stephen Boyd and Lieven Vandenberghe, Convex Optimization. Cambridge University Press, 2004. Dan Klein's Lagrange Multipliers without Permanent Scarring. Peter Doyle and Laurie Snell. Random Walks and Electric Networks. Mathematical Association of America, 1984 Statistics of the English language [MS] 4.2, 1.4.2, 1.4.3 Zipf's law Wentian Li, Comments to "Bell Curves and Monkey Languages", 1996 Wentian Li. Random texts exhibit Zipf's-law-like word frequency distribution. IEEE Transactions on Information Theory, 38(6), 1842-1845, 1992 Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274, 1926-1928. Lillian Lee. "I'm sorry Dave, I'm afraid I can't do that": Linguistics, Statistics, and Natural Language Processing circa 2001. Computer Science: Reflections on the Field, Reflections from the Field, pp. 111--118, 2004. Language modeling [cB] 2.1, 2.2 [MS] 6 or [JM] 6 Stanley F. Chen and Joshua Goodman, An empirical study of smoothing techniques for language modeling TR-10-98, Computer Science Group, Harvard University, 1998 Ronald Rosenfeld. Two decades of Statistical Language Modeling: Where Do We Go From Here? Proceedings of the IEEE, 88(8), 2000. Yee Whye Teh. A Bayesian Interpretation of Interpolated Kneser-Ney. Technical Report TRA2/06, School of Computing, NUS. Thorsten Brants, Ashok C. Popat, Peng Xu, Franz J. Och and Jeffrey Dean. Large Language Models in Machine Translation. EMNLP 2007. A Hierarchical Dirichlet Language Model David MacKay, Linda Peto. 1994 The CMU-Cambridge Statistical Language Modeling toolkit v2 The entropy of a language, information theory [cB] 1.6, including a nice introduction to differential entropy [MS] 2.2 or [JM] 6.7 Brown, Della Pietra, Mercer, Della Pietra, Lai. An estimate of an upper bound for the entropy of English. Computational Linguistics, 18(1), pp31-40, 1992 Claude Shannon. A mathematical theory of communication Thomas Cover and Joy Thomas. Elements of information theory. ISBN 0471062596 Information retrieval and link analysis John Lafferty and Chengxiang Zhai. Probabilistic relevance models based on document and query generation, In Language Modeling and Information Retrieval, Kluwer International Series on Information Retrieval, Vol. 13, 2003. ChengXiang Zhai, John Lafferty. A study of smoothing methods for language models applied to information retrieval, ACM Transactions on Information Systems, Vol. 2, No. 2, April 2004. [MS] 15 The Lemur toolkit Lawrence Page and Sergey Brin and Rajeev Motwani and Terry Winograd. The PageRank Citation Ranking: Bringing Order to the Web. Stanford Digital Library Technologies Project. 1998 Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 604--632, 1999 C. Faloutsos, T. Kolda and J. Sun. Mining Large Time-evolving Data Using Matrix and Tensor Tools. ICML 2007 tutorial, Cornvallis, OR, USA Document summarization P. Turney. Learning to extract keyphrases from text. Technical report, National Research Council, Institute for Information Technology, 1999. A. Hulth. Improved automatic keyword extraction given more linguistic knowledge. In Proc. Conf. Empirical Methods in Natural Language Processing, 2003. R. Mihalcea and P. Tarau. TextRank: Bringing order into texts. In Proc. Conf. Empirical Methods in Natural Language Processing, 2004. G. Erkan and D. Radev. 2004. LexRank: Graph-based centrality as salience in text summarization. Journal of Artificial Intelligence Research. X. Zhu, A. Goldberg, J. Van Gael and D. Andrzejewski. Improving Diversity in Ranking using Absorbing Random Walks. NAACL-HLT, 2007. Text categorization: Naive Bayes, logistic regression [cB] 8.1, 8.2 for Naive Bayes; 4.3 for logistic regression. A Comparison of Event Models for Naive Bayes Text Classification. Andrew McCallum and Kamal Nigam. AAAI-98 Workshop on "Learning for Text Categorization". Andrew McCallum's rainbow statistical text classification code Adam Berger, Stephen Della Pietra, and Vincent Della Pietra, 1996. A maximum entropy approach to natural language processing .Computational Linguistics22(1). Ronald Rosenfeld. A Maximum Entropy Approach to Adaptive Statistical Language Modeling. Computer, Speech and Language 10, 187--228, 1996 Stanley Chen and Ronald Rosenfeld. Efficient Sampling and Feature Selection in Whole Sentence Maximum Entropy Language Models. In Proc. ICASSP '99, Phoenix, Arizona, March 1999. Zhang Le's MaxEnt page Y. Dan Rubenstein and Trevor Hastie, 1997. Discriminative vs Informative Learning.Proc. of KDD. Andrew Y. Ng and Michael Jordan, 2002. On discriminative vs. generative classifiers: A comparison of logistic regression and Naive Bayes.Proc. of NIPS. Florian Wolf, Tomao Poggio and Pawan Sinha, 2006. Human Document Classification Using Bags of Words. Tech report MIT-CSAIL-TR-2006-054. Sentiment, humor, gender analysis with Support Vector Machines [cB] 7.1 Thorsten Joachims. Text categorization with support vector machines: Learning with many relevant features. InProceedings of the European Conference on Machine Learning (ECML), Springer, 1998. Bo Pang and Lillian Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), pp. 1135, 2008. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment Classification using Machine Learning Techniques.EMNLP, 2002. The Yahoo! SentimentAI group: Sentiment and Affect in Text. (need to join the group) Bing Liu's Opinion Mining page. Rada Mihalcea and Carlo Strapparava. Making Computers Laugh: Investigations in Automatic Humor Recognition.EMNLP, 2005. Moshe Koppel, Shlomo Argamon, Anat Rachel Shimoni. Automatically Categorizing Written Texts by Author Gender.Literary and Linguistic Computing 17(4), November 2002, pp. 401-412. Chris Burges. A Tutorial on Support Vector Machines for Pattern Recognition.Knowledge Discovery and Data Mining, 2(2), 1998. Alex J. Smola and Bernhard Scholkopf. A Tutorial on Support Vector Regression, NeuroCOLT Technical Report TR-98-030. 1998 Thorsten Joachims' SVM-light code Clustering Ulrike von Luxburg. A Tutorial on Spectral Clustering. Statistics and Computing 17(4), 395-416 (12 2007). ICML 2004 tutorial on spectral clustering by Chris Ding Fernando Pereira, Naftali Tishby and Lillian Lee. Distributional clustering of English words. Proceedings of the 31st annual meeting on Association for Computational Linguistics, 1993. C.J.C. Burges. Dimension Reduction: A Guided Tour. Foundations and Trends in Machine Learning, 2010. Semi-supervised learning: using both labeled and unlabeled data [cB] 9 for the EM algorithm. Self-training for word sense disambiguation: David Yarowsky, 1995. Unsupervised word sense disambiguation rivaling supervised methods,Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp 189--196. Text Classification from Labeled and Unlabeled Documents using EM. Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. Machine Learning, 39(2/3). pp. 103-134. 2000. Combining Labeled and Unlabeled Data with Co-Training. Avrim Blum and Tom Mitchell. Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 92--100, 1998 T. Joachims, Transductive Inference for Text Classification using Support Vector Machines. Proceedings of the International Conference on Machine Learning (ICML), 1999. Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. Xiaojin Zhu, Zoubin Ghahramani, John Lafferty. The Twentieth International Conference on Machine Learning (ICML-2003) Semi-Supervised Learning Literature Survey. Xiaojin Zhu, Computer Sciences TR 1530, University of Wisconsin - Madison. Latent topic models Semantic space via probabilistic Latent Semantic Analysis, latent Dirichlet allocation Probabilistic Latent Semantic Analysis. Thomas Hofmann. Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI'99) Probabilistic Latent Semantic Indexing. Thomas Hofmann. Proceedings of the 22nd International Conference on Research and Development in Information Retrieval (SIGIR'99) D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January 2003. Griffiths, T., & Steyvers, M. Finding Scientific Topics.. Proceedings of the National Academy of Sciences, 101 (suppl. 1), 5228-5235. 2004 Part of Speech tagging with Hidden Markov Models [MS] 10 for POS tagging [cB] 13.2, [MS] 9, or [JM] 7.1-7.4 for HMM Zoubin Ghahramani, 2001. An Introduction to Hidden Markov Models and Bayesian Networks, International Journal of Pattern Recognition and Artificial Intelligence 15(1):9-42. Lawrence R. Rabiner, 1989. A tutorial on hidden Markov models and selected applications in speech recognition, Proceedings of the IEEE 77(2), pp. 257-286. (An Erratum by Ali Rahimi) David Elworthy, 1994. Does Baum-Welch Re-estimation help taggers? Proceedings of the 4th Conference on Applied Natural Language Processing. Kevin Murphy's Hidden Markov Model (HMM) Toolbox for Matlab Stanford Log-linear Part-Of-Speech Tagger Information extraction with Conditional Random Fields John Lafferty, Andrew McCallum, Fernando Pereira. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML-2001), 2001. Charles Sutton and Andrew McCallum. An Introduction to Conditional Random Fields for Relational Learning. In Introduction to Statistical Relational Learning. Edited by Lise Getoor and Ben Taskar. MIT Press. 2006. Fei Sha and Fernando Pereira. Shallow Parsing with Conditional Random Fields. Proceedings of HLT-NAACL 2003. Andrew McCallum. Efficiently Inducing Features of Conditional Random Fields. Uncertainty in AI, 2003. Hanna Wallach's conditional random fields page Andrew McCallum's MALLET code Parsing and context free grammars [MS] 11 or [JM] 9, 12 Detlef Prescher. A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars. The 15th European Summer School in Logic, Language and Information (ESSLLI-03). Machine Translation Adam L. Berger, Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, John R. Gillett, John D. Lafferty, Robert L. Mercer, Harry Printz, and Lubos Ures, 1994. The Candide System for Machine Translation. Proceedings of the 1994 ARPA Workshop on Human Language Technology Peter F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, and Robert L. Mercer, 1993. The Mathematics of Statistical Machine Translation. Computational Linguistics 19(2), pp. 263--311. Papineni, Roukos, Ward, Zhu. Bleu: a Method for Automatic Evaluation of Machine Translation. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp. 311-318. Speech The CMU Pronouncing Dictionary Spoken Document Retrieval The TREC spoken document retrieval track: A success story, 1999 SpeechFindRelated Coursestext data mining, Callan, CMU human language technologies, Callan, Black, Lavie, CMU information retrieval, Callan, Yang, CMU natural language processing, Cardie, Cornell learning to turn words into data, Cohen, CMU Machine Learning Approaches for Natural Language Processing, Collins, MIT introduction to bioinformatics, Craven, U Wisconsin advanced bioinformatics, Craven, U Wisconsin machine learning for text analysis, Craven, Shavlik, U Wisconsin Empirical Methods in Natural Language Processing, Koehn, Edinburgh statistical foundations of machine learning, Lafferty, Wasserman, CMU algorithms for NLP, Lavie, Frederking, CMU statistical natural language processing: models and methods, Lee, Cornell natural language processing, Lee, Cornell statistical methods for artificial intelligence, McAllester, TTI-C introduction to natural language processing, McCallum, U Mass natural Language Processing, Mihalcea, University of North Texas advanced methods in artificial intelligence, Page, U Wisconsin topics in Natural Language Processing, Ringger, BYU language and statistics, Rosenfeld, CMU machine learning, Shavlik, U Wisconsin speech recognition and understanding, Schultz, Waibel, CMU Graphs and Networks, Spielman, Yale Practical Machine Learning, Jordan, Berkeley Topics in machine learning, Sha, USC Computational Data Analysis: FOUNDATIONS OF MACHINE LEARNING & DATA MINING, Gray, Georgia Tech Analysis of Social Media, Cohen and Glance, CMU Text-Driven Forecasting, Smith, CMU