Explore-Exploit for Recommender Problems

Deepak K. Agarwal; Bee-Chung Chen

doi:10.1017/CBO9781139565868.004

In Chapter 2, we reviewed classical methods to score an item for a user in a given context. In this chapter, we describe scoring methods based on modern techniques, particularly those that are based on explore-exploit methods.

Scoring involves estimating the “value” of an item according to some criteria. Because explicit user intent in most modern-day recommender problems is at best partially observed and often weak, scoring items based on the predicted value of some rating or response is a popular mechanism. For ease of exposition, our response is binary, and the “positive” label corresponds to a certain positive user interaction with the item, such as click, like, or share, whereas “negative” indicates lack of any such positive interaction. For instance, in a news recommender problem, user click on a recommended item is the primary response variable. The score of an item is given by the response rate, which is the expected value of user response. For instance, response rate for a binary variable by this definition will translate to the probability of a positive response. For ease of exposition, throughout this chapter, we use click to refer to a positive response and CTR to refer to the corresponding response rate.

Often the goal in recommender problems is to maximize the total number of positive response, such as clicks on recommended items. For instance, in a news recommender problem, maximizing the total clicks on the recommended news articles is an important objective. With known item response rates, this is easy to accomplish by always recommending the item with the highest response rate. However, because response rates are not known, a key task is to accurately estimate them for items in the item pool. In Sections 2.3 and 2.4, we considered various supervised learning methods to estimate response rates through feature-based models and collaborative filtering. In this chapter, we show that scoring items in a recommender problem is not purely a supervised learning problem but more importantly an explore-exploit problem. We need to achieve a good balance between exploring or experimenting with items that are new or have a small sample size by displaying them in certain numbers of user visits, versus exploiting items that are known to have high response rates with high statistical certainty.

Book contents

3 - Explore-Exploit for Recommender Problems

Summary

Access options

Book contents

3 - Explore-Exploit for Recommender Problems

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive