On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

D. J. Hodge; K. D. Glazebrook

doi:10.1239/aap/1444308876

On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

Part of: Hamilton-Jacobi theories, including dynamic programming Numerical methods in calculus of variations and optimal control Stochastic systems and control

Published online by Cambridge University Press: 21 March 2016

D. J. Hodge and

K. D. Glazebrook

Show author details

D. J. Hodge*: Affiliation:
The University of Nottingham
K. D. Glazebrook*: Affiliation:
Lancaster University
*: ∗ Postal address: School of Mathematical Sciences, University Park, The University of Nottingham, Nottingham NG7 2RD, UK. Email address: david.hodge@nottingham.ac.uk
∗∗ Postal address: Lancaster University Management School, Bailrigg, Lancaster LA1 4YX, UK.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

Keywords

Index heuristic asymptotic optimality multi-action restless bandit stochastic resource allocation

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 49L20: Dynamic programming method 49M20: Methods of relaxation type 93E20: Optimal stochastic control

Type: General Applied Probability
Information: Advances in Applied Probability , Volume 47 , Issue 3 , September 2015 , pp. 652 - 667

DOI: https://doi.org/10.1239/aap/1444308876 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 2015

References

Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314–326.Google Scholar

Ayesta, U., Jacko, P. and Novak, V. (2011). A nearly-optimal index rule for scheduling of users with abandonment. In Proc. IEEE INFOCOM, IEEE, New York, pp. 2849–2857.Google Scholar

Caro, F. and Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Manag. Sci. 53, 276–292.Google Scholar

Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148–177.Google Scholar

Gittins, J. C., Glazebrook, K. D. and Weber, R. R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Oxford.Google Scholar

Glazebrook, K. D., Hodge, D. J. and Kirkbride, C. (2011a). General notions of indexability for queueing control and asset management. Ann. Appl. Prob. 21, 876–907.Google Scholar

Glazebrook, K. D., Kirkbride, C. and Ouenniche, J. (2009). Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operat. Res. 57, 975–989.CrossRef Google Scholar

Glazebrook, K. D., Mitchell, H. M. and Ansell, P. S. (2005). Index policies for the maintenance of a collection of machines by a set of repairmen. Europ. J. Operat. Res. 165, 267–284.Google Scholar

Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754–774.Google Scholar

Glazebrook, K. D., Ansell, P. S., Dunn, R. T. and Lumley, R. R. (2004). On the optimal allocation of service to impatient tasks. J. Appl. Prob. 41, 51–72.Google Scholar

Hodge, D. J. and Glazebrook, K. D. (2011b). Dynamic resource allocation in a multi-product make-to-stock production system. Queueing Systems 67, 333–364.CrossRef Google Scholar

Mitra, D. and Weiss, A. (1988). A Transient Analysis of a Data Network with a Processor-Sharing Switch. AT&T Tech. J. 67, 4–16.Google Scholar

Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198.CrossRef Google Scholar

Opp, M., Glazebrook, K. and Kulkarni, V. G. (2005). Outsourcing warranty repairs: dynamic allocation. Naval Res. Logistics 52, 381–398.Google Scholar

Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Math. Operat. Res. 24, 293–305.Google Scholar

Puterman, M. L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Veatch, M. H. and Wein, L. M. (1996). Scheduling a make-to-stock queue: index policies and hedging points. Operat. Res. 44, 634–647.CrossRef Google Scholar

Weber, R. (2007). Comments on: ‘Dynamic priority allocation via restless bandit marginal productivity indices’. TOP 15, 211–216.CrossRef Google Scholar

Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637–648.Google Scholar

Weber, R. R. and Weiss, G. (1991). Addendum to: ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23, 429–430.Google Scholar

Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25), Applied Probability Trust, Sheffield, pp. 287–298.Google Scholar

Article contents

On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests