Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion

Stephen J. Herschkorn; Erol Peköz; Sheldon M. Ross

doi:10.1017/S0269964800004149

Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion

Published online by Cambridge University Press: 27 July 2009

Stephen J. Herschkorn ,

Erol Peköz and

Sheldon M. Ross

Show author details

Stephen J. Herschkorn: Affiliation:
School of Business and RUTCOR, Rutgers University, New Brunswick, New Jersey 08903
Erol Peköz: Affiliation:
Department of industrial Engineering and Operations Research, University of California, Berkeley, California 94720
Sheldon M. Ross: Affiliation:
Department of industrial Engineering and Operations Research, University of California, Berkeley, California 94720

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We consider a bandit problem with infinitely many Bernoulli arms whose unknown parameters are i.i.d. We present two policies that maximize the almost sure average reward over an infinite horizon. Neither policy ever returns to a previously observed arm after switching to a new one or retains information from discarded arms, and runs of failures indicate the selection of a new arm. The first policy is nonstationary and requires no information about the distribution of the Bernoulli parameter. The second is stationary and requires only partial information; its optimality is established via renewal theory. We also develop ε-optimal stationary policies that require no information about the distribution of the unknown parameter and discuss universally optimal stationary policies.

Type: Research Article
Information: Probability in the Engineering and Informational Sciences , Volume 10 , Issue 1 , January 1996 , pp. 21 - 28

DOI: https://doi.org/10.1017/S0269964800004149 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1.Berry, D., Chen, R., Heath, D., Shepp, L., & Zame, A. (in preparation). A bandit problem with infinitely many arms.Google Scholar

2.Mallows, C.L. & Robbins, H. (1964). Some problems of optimal sampling strategy. Journal of Mathematical Analysis and Applications 8: 90–103.CrossRef Google Scholar

3.Ross, S.M. (1983). Stochastic processes. New York: John Wiley.Google Scholar

4.Yakowitz, S. & Lowe, W. (1991). Nonparametric bandit methods. Annals of Operations Research 28: 297–312.Google Scholar

Article contents

Policies without Memory for the Infinite-Armed Bernoulli Bandit under the Average-Reward Criterion

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests