Restless bandits: activity allocation in a changing world

P. Whittle

doi:10.2307/3214163

Abstract

We consider a population of n projects which in general continue to evolve whether in operation or not (although by different rules). It is desired to choose the projects in operation at each instant of time so as to maximise the expected rate of reward, under a constraint upon the expected number of projects in operation. The Lagrange multiplier associated with this constraint defines an index which reduces to the Gittins index when projects not being operated are static. If one is constrained to operate m projects exactly then arguments are advanced to support the conjecture that, for m and n large in constant ratio, the policy of operating the m projects of largest current index is nearly optimal. The index is evaluated for some particular projects.

References

Gittins, J. C. (1979) Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148–164.Google Scholar

Gittins, J. C. and Jones, D. M. (1974) A dynamic allocation index for the sequential design of experiments. In Progress in Statistics ed. Gani, J., North-Holland, Amsterdam, 241–266.Google Scholar

Weiss, G. (1987) Approximation in results in parallel machines stochastic scheduling. Presented at the Twelfth Symposium on Operations Research, Passau.Google Scholar

Whittle, P. (1980) Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 142–149.Google Scholar

Whittle, P. (1981) Arm-acquiring bandits. Ann. Prob. 9, 284–292.CrossRef Google Scholar

Whittle, P. (1984) Optimal routing in Jackson networks. Asia-Pacific J. Operat. Res. 1, 32–37.Google Scholar

Whittle, P. (1986) Systems in Stochastic Equilibrium . Wiley, Chichester.Google Scholar

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Van Oyen, M.P. and Teneketzis, D. 1992. Optimal stochastic scheduling of connected queues with switching costs. p. 3328.

Papadimitriou, C.H. and Tsitsiklis, J.N. 1994. The complexity of optimal queueing network control. p. 318.

Pandelis, D.G. and Teneketzis, D. 1995. On the optimality of the Gittins index rule in multi-armed bandits with multiple plays. Vol. 2, Issue. , p. 1408.

Righter, Rhonda and Shanthikumar, J. George 1998. Independently Expiring Multiarmed Bandits. Probability in the Engineering and Informational Sciences, Vol. 12, Issue. 4, p. 453.

Papadimitriou, Christos H. and Tsitsiklis, John N. 1999. The Complexity of Optimal Queuing Network Control. Mathematics of Operations Research, Vol. 24, Issue. 2, p. 293.

Lott, C. and Teneketzis, D. 1999. Multi-channel allocation in single-hop mobile networks with priorities. Vol. 4, Issue. , p. 3550.

Bertsimas, Dimitris and Niño-Mora, José 2000. Restless Bandits, Linear Programming Relaxations, and a Primal-Dual Index Heuristic. Operations Research, Vol. 48, Issue. 1, p. 80.

Blondel, Vincent D. and Tsitsiklis, John N. 2000. A survey of computational complexity results in systems and control. Automatica, Vol. 36, Issue. 9, p. 1249.

O'Meara, T. and Patel, A. 2001. A topic-specific Web robot model based on restless bandits. IEEE Internet Computing, Vol. 5, Issue. 2, p. 27.

Dusonchet, F. and Hongler, M.-O. 2001. Dynamic scheduling of a multi-items production operating on a make-to-stock basis. Vol. 1, Issue. , p. 549.

Whittle, Peter 2002. Applied Probability in Great Britain. Operations Research, Vol. 50, Issue. 1, p. 227.

Washburn, R.B. Schneider, M.K. and Fox, J.J. 2002. Stochastic dynamic programming based approaches to sensor resource management. Vol. 1, Issue. , p. 608.

Glazebrook, K.D. and Mitchell, H.M. 2002. An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Research Logistics (NRL), Vol. 49, Issue. 7, p. 706.

Raissi-Dehkordi, M. and Baras, J.S. 2002. Broadcast scheduling in information delivery systems. Vol. 3, Issue. , p. 2935.

Dusonchet, F. and Hongler, M. 2003. Continuous-time restless bandit and dynamic scheduling for make-to-stock production. IEEE Transactions on Robotics and Automation, Vol. 19, Issue. 6, p. 977.

Keller, Godfrey and Oldale, Alison 2003. Branching bandits: a sequential search process with correlated pay-offs. Journal of Economic Theory, Vol. 113, Issue. 2, p. 302.

Glazebrook, K.D. and Kirkbride, C. 2004. Index policies for the routing of background jobs. Naval Research Logistics (NRL), Vol. 51, Issue. 6, p. 856.

Jun, Tackseung 2004. A survey on the bandit problem with switching costs. De Economist, Vol. 152, Issue. 4, p. 513.

Ehsan, N. and Mingyan Liu 2004. On the optimality of an index policy for bandwidth allocation with delayed state observation and differentiated services. Vol. 3, Issue. , p. 1974.

Goyal, M. Kumar, A. and Sharma, V. 2005. Delay optimal control algorithm for a multiaccess fading channel with peak power constraint. p. 2060.

Download full list

Article contents

Restless bandits: activity allocation in a changing world

Abstract

Keywords

Access options

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Restless bandits: activity allocation in a changing world

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests