ICML'11 Tutorial on

Machine Learning for Large Scale Recommender Systems

Deepak Agarwal and Bee-Chung Chen
Yahoo! Research
{dagarwal,beechun}@yahoo-inc.com

Topic Overview

We will provide an in-depth introduction of machine learning challenges that arise in the context of recommender problems for web applications. Since Netflix released a large movie ratings dataset, recommender problems have received considerable attention at ICML. The focus of this tutorial however is on web applications and we will cover topics that are significant extensions of those discussed in the context of the Netflix contest. While the research pursued in the context of Netflix made great advances in ``offline-modeling''' (fitting model to historic data obtained from a recommender system), this tutorial goes beyond this and discusses important issues for both ``offline'' and ``online'' modeling (the science of designing explore/exploit schemes that serve items to users in an optimal fashion).

At an abstract level, the goal of recommender systems is to match items from an inventory for each user visit in some context. Each display results in some response from the user (clicks, ratings and so on) that updates our belief about user preferences and results in better recommendations in the future. The machine learning challenge is to construct a serving scheme or sequential design that learns user preferences through interactions with items in order to maximize some utility function over a long time horizon. For instance, a portal like Yahoo! may be interested in constructing a serving scheme that displays articles to users visiting their front page to maximize click rates. In online advertising, an ad-exchange is interested in estimating the utility of an ad being shown to a user visiting a publisher page.

We believe this tutorial will be of great interest to ICML attendees. Recommender problems are pervasive but their success depends on effectively matching users to items in different contexts based on some utilities like click-rates, buy rates, movie ratings and so on. The utilities are not known and have to be estimated through data. This is a challenging machine learning problem since the response is obtained through strong interactions among several categorical variables like user and item that have many levels (one per user/item ID) with heavy tailed distributions (small number of levels have large sample sizes, the rest of them get small samples). Such unbalanced data distribution induces significant data sparseness and makes regularization and smoothing essential. Several modern machine learning models that tackle the problem of data sparsity in high dimensional categorical data are germane for these class of applications. Other than learning good models in high dimensional scenarios with data sparsity, an important issue in recommendation problems is online methods in general and explore/exploit problems in particular. This is important for good performance since often the goal is to maximize some utility like total clicks or total revenue in some time period, optimizing for out-of-sample predictive accuracy is not sufficient. Bandit problems are of great interest in machine learning, these class of applications provide a great opportunity to motivate new theoretical research in this area. Other than modeling, two other important machine learning issues are important in recommender systems --- model evaluation on offline data and learning intrinsic item quality from biased data.

Intended Audience

We will only assume graduate level knowledge in statistical and machine learning methods, no prior knowledge of recommender system is required. In fact, the first one hour that provides an introduction to the area would be appropriate for everyone attending ICML 2011. The second half would require familiarity with basic concepts like regression, probability distributions and appreciation of issues involved in fitting machine learning models to large scale applications. We will not assume any prior knowledge of multi-armed bandits or matrix factorization but some familiarity would help the attendee recognize new opportunities for both empirical and theoretical work in this area.

Content details

The tutorial will begin with a formal definition of the problem through real life examples drawn from actual applications like content optimization, computational advertising, movie recommendation and web search. We provide a detailed discussion of various scenarios under which recommendations may have to be made, including varying item pool size, dynamic versus static item pool, degrees of cold-start both for users and items, number of items to be selected for each display, user fatigue and so on. These scenarios go significantly beyond the classical movie recommendation problem. We then provide a comprehensive overview of state-of-the-art techniques in this area with detailed discussion of several open problems that we hope will stimulate further research. In particular, we describe time-series models, multi-armed bandit schemes, regression approaches, matrix factorization approaches, cold-start, similarity-based approaches, which are exemplified through real world examples. We will end with a detailed discussion of several technical challenges in this area. Throughout, we will draw heavily on examples drawn from the application areas of content optimization and computational advertising.

Outline of Material: The outline is as follows.

Slides

Please note that the slides are currently under revision, but should be useful to give an idea about the content to be covered.
  1. Introduction (PDF)
  2. Offline components (PDF)
  3. Online components (PDF)
  4. Evaluation methods and challenges (PDF)

References

  1. A.S. Das, M. Datar, A. Garg, and S. Rajaram. Google News Personalization: Scalable Online Collaborative Filtering. In WWW, 2007.
  2. J. Gittins. Bandit Processes and Dynamic Allocation Indices. J. of the Royal Stat. Society, B, vol. 41, 1979.
  3. T.L. Lai and H. Robbins. Asymptotically Efficient Adaptive Allocation Rules. Advances in Applied Mathematics, 1985.
  4. P. Whittle. Restless Bandits: Activity Allocation in a Changing World. J. of Applied Probability, 1988.
  5. P. Auer, N. Cesa-Bianchi and P. Fischer. Finite-time Analysis of the Multiarmed Bandit Problem. Machine Learning, 2002.
  6. D. Agarwal and S. Merugu. Predictive Discrete Latent Factor Models for Large Scale Dyadic Data. In KDD, 2007.
  7. D. Agarwal, A.Z. Broder, D. Chakrabarti, D. Diklic, V. Josifovski, and M. Sayyadian. Estimating Rates of Rare Events at Multiple Resolutions. In KDD, 2007.
  8. D. Agarwal and D. Chakrabarti. Statistical Challenges in Oneline Advertising (Tutorial). In CIKM, 2008; In KDD, 2009.
  9. D. Agarwal. Statistical Challenges in Internet Advertising, Galit Shmueli and Wolgang Jank (eds). Wiley, 2008.
  10. D. Agarwal and L. Zhang. Fast computation of posterior mode in multi-level hierarchical models. In NIPS, 2008.
  11. D. Agarwal, B.-C. Chen, P. Elango, R. Ramakrishnan, S. Park, S. Roy, N. Motgi and J. Zachariah. Online Models for Content Optimization. In NIPS, 2008.
  12. D. Agarwal. Statistics for Computational Advertising. ISBA Bulletin, 16(3):14-17, 2009.
  13. D. Agarwal, B.-C. Chen and P. Elango. Spatio-Temporal Models for Estimating Click-Through Rates. In WWW, 2009.
  14. D. Agarwal, B.-C. Chen and P. Elango. Explore/Exploit Schemes for Web Content Optimization. In ICDM, 2009.
  15. D. Agarwal and B.-C. Chen. Regression based Latent Factor Models. In KDD, 2009.
  16. D. Agarwal, E. Gabrilovich, R. Hall, V. Josifovski and R. Khanna. Translating Relevance Scores to Probabilities. In CIKM, 2009.
  17. D. Agarwal, R. Agrawal, R. Khanna and N. Kota. Estimating Rates of Rare Events with Multiple Hierarchies through Scalable Log-linear Models. In KDD, 2010.
  18. D. Agarwal, B.-C. Chen and P. Elango. Fast Online Learning through Offline Initialization for Time-Sensitive Recommendation. In KDD, 2010.
  19. D. Agarwal and B.-C. Chen. fLDA: Matrix Factorization through Latent Dirichlet Allocation. In WSDM, 2010.
  20. D. Agarwal and B.-C. Chen. Latent OLAP: Data Cubes over Latent Variables. In SIGMOD, 2011.
  21. D. Agarwal, B.-C. Chen and B. Long. Localized Factor Models for Multi-Context Recommendation. In KDD, 2011.
  22. D. Agarwal, B.-C. Chen, P. Elango and X. Wang. Click Shaping to Optimize Multiple Objectives. In KDD, 2011.
  23. D. Chakrabarti, D. Agarwal, and V. Josifovski. Contextual Advertising by Combining Relevance with Click Feedback. In WWW, 2008.
  24. Y. Koren. Collaborative Filtering with Temporal Dynamics. In KDD, 2009.
  25. Z. Lu, D. Agarwal and I. Dhillon. A Spatio-Temporal Approach to Collaborative Filtering. In RecSys, 2009.
  26. S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski. Bandits for Taxonomies: A Model-Based Approach. In SDM, 2007.
  27. S. Pandey, D. Chakrabarti, and D. Agarwal. Multi-Armed Bandit Problems with Dependent Arms. In ICML, 2007.
  28. F. Radlinski, R. Kleinberg, and T. Joachims. Learning Diverse Rankings with Multi-Armed Bandits. In ICML, 2008.
  29. M. Streeter and D. Golovin. An Online Algorithm for Maximizing Submodular Functions. In NIPS, 2008.
  30. Y. Yue and T. Joachims. Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem. In ICML, 2009.