Hostname: page-component-76fb5796d-9pm4c Total loading time: 0 Render date: 2024-04-26T08:50:34.634Z Has data issue: false hasContentIssue false

On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

Published online by Cambridge University Press:  21 March 2016

D. J. Hodge*
Affiliation:
The University of Nottingham
K. D. Glazebrook*
Affiliation:
Lancaster University
*
Postal address: School of Mathematical Sciences, University Park, The University of Nottingham, Nottingham NG7 2RD, UK. Email address: david.hodge@nottingham.ac.uk
∗∗ Postal address: Lancaster University Management School, Bailrigg, Lancaster LA1 4YX, UK.
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

Type
General Applied Probability
Copyright
Copyright © Applied Probability Trust 2015 

References

Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314326.Google Scholar
Ayesta, U., Jacko, P. and Novak, V. (2011). A nearly-optimal index rule for scheduling of users with abandonment. In Proc. IEEE INFOCOM, IEEE, New York, pp. 28492857.Google Scholar
Caro, F. and Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Manag. Sci. 53, 276292.Google Scholar
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148177.Google Scholar
Gittins, J. C., Glazebrook, K. D. and Weber, R. R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Oxford.Google Scholar
Glazebrook, K. D., Hodge, D. J. and Kirkbride, C. (2011a). General notions of indexability for queueing control and asset management. Ann. Appl. Prob. 21, 876907.Google Scholar
Glazebrook, K. D., Kirkbride, C. and Ouenniche, J. (2009). Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operat. Res. 57, 975989.CrossRefGoogle Scholar
Glazebrook, K. D., Mitchell, H. M. and Ansell, P. S. (2005). Index policies for the maintenance of a collection of machines by a set of repairmen. Europ. J. Operat. Res. 165, 267284.Google Scholar
Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754774.Google Scholar
Glazebrook, K. D., Ansell, P. S., Dunn, R. T. and Lumley, R. R. (2004). On the optimal allocation of service to impatient tasks. J. Appl. Prob. 41, 5172.Google Scholar
Hodge, D. J. and Glazebrook, K. D. (2011b). Dynamic resource allocation in a multi-product make-to-stock production system. Queueing Systems 67, 333364.CrossRefGoogle Scholar
Mitra, D. and Weiss, A. (1988). A Transient Analysis of a Data Network with a Processor-Sharing Switch. AT&T Tech. J. 67, 416.Google Scholar
Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161198.CrossRefGoogle Scholar
Opp, M., Glazebrook, K. and Kulkarni, V. G. (2005). Outsourcing warranty repairs: dynamic allocation. Naval Res. Logistics 52, 381398.Google Scholar
Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Math. Operat. Res. 24, 293305.Google Scholar
Puterman, M. L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar
Veatch, M. H. and Wein, L. M. (1996). Scheduling a make-to-stock queue: index policies and hedging points. Operat. Res. 44, 634647.CrossRefGoogle Scholar
Weber, R. (2007). Comments on: ‘Dynamic priority allocation via restless bandit marginal productivity indices’. TOP 15, 211216.CrossRefGoogle Scholar
Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637648.Google Scholar
Weber, R. R. and Weiss, G. (1991). Addendum to: ‘On an index policy for restless bandits’. Adv. Appl. Prob. 23, 429430.Google Scholar
Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25), Applied Probability Trust, Sheffield, pp. 287298.Google Scholar