We will provide an in-depth introduction of machine learning challenges that arise in the context of recommender problems for web applications. Since Netflix released a large movie ratings dataset, recommender problems have received considerable attention at ICML. The focus of this tutorial however is on web applications and we will cover topics that are significant extensions of those discussed in the context of the Netflix contest. While the research pursued in the context of Netflix made great advances in ``offline-modeling''' (fitting model to historic data obtained from a recommender system), this tutorial goes beyond this and discusses important issues for both ``offline'' and ``online'' modeling (the science of designing explore/exploit schemes that serve items to users in an optimal fashion).
At an abstract level, the goal of recommender systems is to match items from an inventory for each user visit in some context. Each display results in some response from the user (clicks, ratings and so on) that updates our belief about user preferences and results in better recommendations in the future. The machine learning challenge is to construct a serving scheme or sequential design that learns user preferences through interactions with items in order to maximize some utility function over a long time horizon. For instance, a portal like Yahoo! may be interested in constructing a serving scheme that displays articles to users visiting their front page to maximize click rates. In online advertising, an ad-exchange is interested in estimating the utility of an ad being shown to a user visiting a publisher page.
We believe this tutorial will be of great interest to ICML attendees. Recommender problems are pervasive but their success depends on effectively matching users to items in different contexts based on some utilities like click-rates, buy rates, movie ratings and so on. The utilities are not known and have to be estimated through data. This is a challenging machine learning problem since the response is obtained through strong interactions among several categorical variables like user and item that have many levels (one per user/item ID) with heavy tailed distributions (small number of levels have large sample sizes, the rest of them get small samples). Such unbalanced data distribution induces significant data sparseness and makes regularization and smoothing essential. Several modern machine learning models that tackle the problem of data sparsity in high dimensional categorical data are germane for these class of applications. Other than learning good models in high dimensional scenarios with data sparsity, an important issue in recommendation problems is online methods in general and explore/exploit problems in particular. This is important for good performance since often the goal is to maximize some utility like total clicks or total revenue in some time period, optimizing for out-of-sample predictive accuracy is not sufficient. Bandit problems are of great interest in machine learning, these class of applications provide a great opportunity to motivate new theoretical research in this area. Other than modeling, two other important machine learning issues are important in recommender systems --- model evaluation on offline data and learning intrinsic item quality from biased data.
We will only assume graduate level knowledge in statistical and machine learning methods, no prior knowledge of recommender system is required. In fact, the first one hour that provides an introduction to the area would be appropriate for everyone attending ICML 2011. The second half would require familiarity with basic concepts like regression, probability distributions and appreciation of issues involved in fitting machine learning models to large scale applications. We will not assume any prior knowledge of multi-armed bandits or matrix factorization but some familiarity would help the attendee recognize new opportunities for both empirical and theoretical work in this area.
The tutorial will begin with a formal definition of the problem through real life examples drawn from actual applications like content optimization, computational advertising, movie recommendation and web search. We provide a detailed discussion of various scenarios under which recommendations may have to be made, including varying item pool size, dynamic versus static item pool, degrees of cold-start both for users and items, number of items to be selected for each display, user fatigue and so on. These scenarios go significantly beyond the classical movie recommendation problem. We then provide a comprehensive overview of state-of-the-art techniques in this area with detailed discussion of several open problems that we hope will stimulate further research. In particular, we describe time-series models, multi-armed bandit schemes, regression approaches, matrix factorization approaches, cold-start, similarity-based approaches, which are exemplified through real world examples. We will end with a detailed discussion of several technical challenges in this area. Throughout, we will draw heavily on examples drawn from the application areas of content optimization and computational advertising.
Outline of Material: The outline is as follows.