On an index policy for restless bandits

Richard R. Weber; Gideon Weiss

doi:10.2307/3214547

On an index policy for restless bandits

Published online by Cambridge University Press: 14 July 2016

Richard R. Weber and

Gideon Weiss

Show author details

Richard R. Weber*: Affiliation:
University of Cambridge
Gideon Weiss*: Affiliation:
Georgia Institute of Technology
*: ∗Postal address: Cambridge University Engineering Department, Management Studies Group, Mill Lane, Cambridge CB2 1RX, UK.
∗∗Postal address: School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, USA.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average reward under a constraint that exactly m of the n projects receive effort at any one time. We show that as m and n tend to ∞ with m/n fixed, the per-project reward of the optimal policy is asymptotically the same as that achieved by a policy which operates under the relaxed constraint that an average of m projects be active. The relaxed constraint was considered by Whittle (1988) who described how to use a Lagrangian multiplier approach to assign indices to the projects. He conjectured that the policy of allocating effort to the m projects of greatest index is asymptotically optimal as m and n tend to∞. We show that the conjecture is true if the differential equation describing the fluid approximation to the index policy has a globally stable equilibrium point. This need not be the case, and we present an example for which the index policy is not asymptotically optimal. However, numerical work suggests that such counterexamples are extremely rare and that the size of the suboptimality which one might expect is minuscule.

Keywords

FLUID APPROXIMATIONS GITTINS INDEX LARGE DEVIATION THEORY MULTI-ARMED BANDIT PROBLEM STOCHASTIC SCHEDULING

Type: Research Papers
Information: Journal of Applied Probability , Volume 27 , Issue 3 , September 1990 , pp. 637 - 648

DOI: https://doi.org/10.2307/3214547 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Freidlin, M. I. and Ventsel, A. D. (1984) Random Perturbations of Dynamical Systems. Springer-Verlag, New York.CrossRef Google Scholar

Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. Gani, J., North-Holland, Amsterdam, 241–266.Google Scholar

Mitra, D. and Weiss, A. (1988) A fluid limit of a closed queueing network with applications to data networks.Google Scholar

Whittle, P. (1988) Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability, ed. Gani, J., J. Appl. Prob. 25A, 287–298.Google Scholar

Article contents

On an index policy for restless bandits

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests