Worth of perfect information in bernoulli bandits

Donald A. Berry; Robert P. Kertz

doi:10.2307/1427509

Worth of perfect information in bernoulli bandits

Published online by Cambridge University Press: 01 July 2016

Donald A. Berry and

Robert P. Kertz

Show author details

Donald A. Berry*: Affiliation:
University of Minnesota
Robert P. Kertz*: Affiliation:
Georgia Institute of Technology
*: ∗Postal address: School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA. Supported in part by NSF Grants DMS 85-05023 and DMS 88-03087.
∗∗Postal address: School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

For k-armed Bernoulli bandits with discounting, sharp comparisons are given between average optimal rewards for a gambler and for a ‘perfectly informed' gambler, over natural collections of prior distributions. Some of these comparisons are proved under general discounting, and others under non-increasing discount sequences. Connections are made between these comparisons and the concept of ‘regret' in the minimax approach to bandit processes. Identification of extremal cases in the sharp comparisons is emphasized.

Keywords

BAYESIAN DECISION MAKING MEASURE OF PERFECT INFORMATION REGRET MINIMAX ANALYSIS FAMILIES OF PRIOR DISTRIBUTIONS

Type: Research Article
Information: Advances in Applied Probability , Volume 23 , Issue 1 , March 1991 , pp. 1 - 23

DOI: https://doi.org/10.2307/1427509 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1991

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Supported in part by NSF Grants DMS 86-01153 and DMS 88-01818.

References

1. Bather, J. A. (1983) The minimax risk for the two-armed bandit problem in Mathematical Learning Models—Theory and Algorithms , eds. Herkenrath, U., Kalin, D., and Vogel, W., Springer-Verlag, New York, 1–11.Google Scholar

2. Berry, D. A. and Fristedt, B. (1985) Bandit Problems—Sequential Allocation of Experiments . Chapman and Hall, New York.Google Scholar

3. Cox, D. C. and Kertz, R. P. (1986) Prophet regions and sharp inequalities for pth absolute moments of martingales. J. Multivariate Anal. 18, 242–273.Google Scholar

4. Feldman, D. (1962) Contributions to the ‘two-armed bandit’ problem. Ann. Math. Statist. 33, 847–856.Google Scholar

5. Hildenbrand, W. (1974) Core and Equilibria of Large Economy. Princeton University Press.Google Scholar

6. Kertz, R. P. (1986) Stop rule and supremum expectations of i.i.d. random variables: a complete comparison by conjugate duality. J. Multivariate Anal. 19, 88–112.Google Scholar

7. Kertz, R. P. (1986) Prophet problems in optimal stopping: results, techniques, and variations. Preprint, School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332.Google Scholar

8. Kolonko, M. and Benzing, H. (1985) On monotone optimal decision rules and the stay-on-a-winner rule for the two-armed bandit. Metrika 32, 395–407.Google Scholar

9. Lai, T. L. and Robbins, H. (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22.Google Scholar

10. Luenberger, D. G. (1969) Optimization by Vector Space Methods. Wiley, New York.Google Scholar

11. Quisel, K. (1965) Extensions of the two-armed bandit and related processes with on-line experimentation. Tech. Rep. No. 137, Institute for Mathematical Studies in the Social Sciences, Stanford Univ. Google Scholar

12. Raiffa, H. and Schlaiffer, R. (1961) Applied Statistical Decision Theory. Division of Research, Graduate School of Business Administration, Harvard University, Boston.Google Scholar

13. Rockafeller, R. T. (1970) Convex Analysis. Princeton University Press.Google Scholar

14. Rodman, L. (1978) On the many-armed bandit problem. Ann. Prob. 6, 491–498.Google Scholar

15. Stoer, J. and Witzgall, C. (1970) Convexity and Optimization in Finite Dimensions I. Springer-Verlag, New York.Google Scholar

16. Vogel, W. (1960) An asymptotic minimax theorem for the two-armed bandit problem. Ann. Math. Statist. 31, 444–451.Google Scholar

17. Zaborskis, A. A. (1976) Sequential Bayesian plan for choosing the best method of medical treatment. Avtomatika i Telemekhanika II, 144–153.Google Scholar

Article contents

Worth of perfect information in bernoulli bandits

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests