On the evaluation of suboptimal strategies for families of alternative bandit processes

K. D. Glazebrook

doi:10.2307/3213534

Abstract

Families of alternative bandit processes have been used as models for problems in a variety of areas. Optimal strategies for these decision processes are determined by dynamic allocation indices. These indices are here shown to play an important role in the evaluation of suboptimal strategies.

References

Bather, J. A. (1981) Randomized allocation of treatments in sequential experiments (with discussion). J. R. Statist. Soc. B 43, 265–292.Google Scholar

Beckmann, M. J. (1973) Der diskontierte Bandit. OR - Verfahren XVIII, 9–18.Google Scholar

Fischer, J. (1979) Der diskontierte einarmige Bandit. Metrika 26, 195–204.Google Scholar

Gittins, J. C. (1979) Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148–177.Google Scholar

Gittins, J. C. and Glazebrook, K. D. (1977) On Bayesian models in stochastic scheduling. J. Appl. Prob. 14, 556–565.Google Scholar

Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J. North-Holland, Amsterdam.Google Scholar

Glazebrook, K. D. (1976) Stochastic scheduling with order constraints. Internat. J. Systems Sci. 7, 657–666.Google Scholar

Glazebrook, K. D. (1978) On the optimal allocation of two or more treatments in a controlled clinical trial. Biometrika 65, 335–340.Google Scholar

Glazebrook, K. D. (1980) On randomised dynamic allocation indices for the sequential design of experiments. J. R. Statist. Soc. B 42, 342–346.Google Scholar

Glazebrook, K. D. and Jones, D. M. (1983) Some best possible results for a discounted one armed bandit. Metrika. To appear.Google Scholar

Nash, P. (1973) Optimal Allocation of Resources Between Research Projects. Ph. D. Thesis, Cambridge University.Google Scholar

Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143–149.Google Scholar

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Glazebrook, K. 1983. Optimal strategies for families of alternative bandit processes. IEEE Transactions on Automatic Control, Vol. 28, Issue. 8, p. 858.

Varaiya, P. Walrand, J. and Buyukkoc, C. 1985. Extensions of the multiarmed bandit problem: The discounted case. IEEE Transactions on Automatic Control, Vol. 30, Issue. 5, p. 426.

Kumar, P. R. 1985. A Survey of Some Results in Stochastic Adaptive Control. SIAM Journal on Control and Optimization, Vol. 23, Issue. 3, p. 329.

Glazebrook, K. D. and Fay, N. A. 1987. On the scheduling of alternative stochastic jobs on a single machine. Advances in Applied Probability, Vol. 19, Issue. 4, p. 955.

Fay, N. A. and Glazebrook, K. D. 1989. A General Model for the Scheduling of Alternative Stochastic Jobs that may Fail. Probability in the Engineering and Informational Sciences, Vol. 3, Issue. 2, p. 199.

Glazebrook, K. D. 1990. Procedures for the evaluation of strategies for resource allocation in a stochastic environment. Journal of Applied Probability, Vol. 27, Issue. 1, p. 215.

Fay, N. A. and Walrand, J. C. 1991. On approximately optimal index strategies for generalised arm problems. Journal of Applied Probability, Vol. 28, Issue. 03, p. 602.

Glazebrook, K. D. Boys, R. J. and Fay, N. A. 1991. On the evaluation of strategies for branching bandit processes. Annals of Operations Research, Vol. 30, Issue. 1, p. 299.

Pilnick, S. E. Glazebrook, K. D. and Gaver, D. P. 1991. Optimal sequential replenishment of ships during combat. Naval Research Logistics, Vol. 38, Issue. 5, p. 637.

Glazebrook, K. D. 1991. Bounds for discounted stochastic scheduling problems. Journal of Applied Probability, Vol. 28, Issue. 04, p. 791.

Benkherouf, L. Glazebrook, K. D. and Owen, R. W. 1992. Gittins Indices and Oil Exploration. Journal of the Royal Statistical Society Series B: Statistical Methodology, Vol. 54, Issue. 1, p. 229.

Asawa, M. and Teneketzis, D. 1994. Multi-armed bandits with switching costs. Vol. 1, Issue. , p. 168.

Glazebrook, K.D. 1995. Stochastic scheduling and forwards induction. Discrete Applied Mathematics, Vol. 57, Issue. 2-3, p. 145.

Asawa, M. and Teneketzis, D. 1996. Multi-armed bandits with switching penalties. IEEE Transactions on Automatic Control, Vol. 41, Issue. 3, p. 328.

Katehakis, Michael N. and Rothblum, Uriel G. 1996. Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality. The Annals of Applied Probability, Vol. 6, Issue. 3,

Glazebrook, K. D. 1996. On the undiscounted tax problem with precedence constraints. Advances in Applied Probability, Vol. 28, Issue. 4, p. 1123.

Garbe, R. and Glazebrook, K. D. 1998. Stochastic Scheduling with Priority Classes. Mathematics of Operations Research, Vol. 23, Issue. 1, p. 119.

Crosbie, J. H. and Glazebrook, K. D. 2000. Evaluating policies for generalized bandits via a notion of duality. Journal of Applied Probability, Vol. 37, Issue. 2, p. 540.

2011. Multi‐Armed Bandit Allocation Indices. p. 249.

2012. Optimal Learning. p. 366.

Download full list

Article contents

On the evaluation of suboptimal strategies for families of alternative bandit processes

Abstract

Keywords

Access options

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

On the evaluation of suboptimal strategies for families of alternative bandit processes

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests