Skip to main content Accessibility help
×
Home

Evaluating policies for generalized bandits via a notion of duality

  • J. H. Crosbie (a1) and K. D. Glazebrook (a1)

Abstract

Nash's generalization of Gittins’ classic index result to so-called generalized bandit problems (GBPs) in which returns are dependent on the states of all arms (not only the one which is pulled) has proved important for applications. The index theory for special cases of this model in which all indices are positive is straightforward. However, this is not a natural restriction in practice. An earlier proposal for the general case did not yield satisfactory index-based suboptimality bounds for policies — a central feature of classical Gittins index theory. We develop such bounds via a notion of duality for GBPs which is of independent interest. The index which emerges naturally from this analysis is the reciprocal of the one proposed by Nash.

Copyright

Corresponding author

Postal address: Department of Statistics, University of Newcastle, Newcastle upon Tyne NE1 7RU, UK
∗∗ Email address: kevin.glazebrook@ncl.ac.uk

References

Hide All
Fay, N. A., and Glazebrook, K. D. (1987). On the scheduling of alternative stochastic jobs on a single machine. Adv. Appl. Prob. 19, 955973.
Fay, N. A., and Glazebrook, K. D. (1989). A general model of the scheduling of alternative tasks on a single machine. Proc. Eng. Inform. Sci. 3, 199221.
Fay, N. A., and Walrand, J. C. (1991). On approximately optimal index strategies for generalised arm problems. J. Appl. Prob. 28, 602612.
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B, 41, 148177.
Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. John Wiley, Chichester.
Gittins, J. C., and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, eds Gani, J. and Vince, I. North Holland, Amsterdam, pp. 241266.
Glazebrook, K. D. (1982). On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Prob. 19, 716722.
Glazebrook, K. D., and Fay, N. A. (1988). Evaluating strategies for generalised bandit problems. Int. J. Syst. Sci. 19, 16051613.
Glazebrook, K. D., and Greatrix, S. (1993). On scheduling influential stochastic tasks on a single machine. Eur. J. Operat. Res. 70, 405424.
Glazebrook, K. D., and Greatrix, S. (1995). On transforming an index for generalised bandit problems. J. Appl. Prob. 32, 163182.
Glazebrook, K. D., and Owen, R. W. (1991). New results for generalised bandit problems. Int. J. Syst. Sci. 22, 479494.
Nash, P. (1973). Optimal allocation of resources between research projects. Ph.D. thesis, University of Cambridge.
Nash, P. (1980). A generalised bandit problem. J. R. Statist. Soc. B, 42, 165169.
Ross, S. M. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, London.

Keywords

MSC classification

Related content

Powered by UNSILO

Evaluating policies for generalized bandits via a notion of duality

  • J. H. Crosbie (a1) and K. D. Glazebrook (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.