Ahmad, S. H. A. et al. (2009). Optimality of myopic sensing in multichannel opportunistic access. IEEE Trans. Inf. Theory 55, 4040–4050.
Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.
Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall, London.
Bertsimas, D. and Niño-Mora, J. (2000). Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Operat. Res. 48, 80–90.
Bradt, R. N., Johnson, S. M. and Karlin, S. (1956). On sequential designs for maximizing the sum of n observations. Ann. Math. Statist. 27, 1060–1074.
Caro, F. and Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Manag. Sci. 53, 276–292.
Cohen, K., Zhao, Q. and Scaglione, A. (2014). Restless multi-armed bandits under time-varying activation constraints for dynamic spectrum access. In 2014 48th Asilomar Conference on Signals, Systems and Computers, IEEE, pp. 1575–1578.
Deo, S. et al. (2013). Improving health outcomes through better capacity allocation in a community-based chronic care model. Operat. Res. 61, 1277–1294.
Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (Budapest, 1972; Colloq. Math. Soc. János Bolyai 9), North-Holland, Amsterdam, pp. 241–266.
Gittins, J., Glazebrook, K. and Weber, R. (2011). Multi-Armed Bandit Allocation Indices, 2nd edn. John Wiley, Chichester.
Kelly, F. P. (1981). Multi-armed bandits with discount factor near one: the Bernoulli case. Ann. Statist. 9, 987–1001.
Le Ny, J., Dahleh, M. and Feron, E. (2008). A linear programming relaxation and a heuristic for the restless bandit problem with general switching costs. Preprint. Available at https://arxiv.org/abs/0805.1563v1.
Lee, E., Lavieri, M. S. and Volk, M. (2018). Optimal screening for hepatocellular carcinoma: a restless bandit model. Manufacturing Service Operat. Manag. 21.
Mahajan, A. and Teneketzis, D. (2008). Multi-armed bandit problems. In Foundations Applications of Sensor Management, Springer, Boston, MA, pp. 121–151.
Nain, P. and Ross, K. W. (1986). Optimal priority assignment with hard constraint. IEEE Trans. Automatic Control 31, 883–888.
Niño-Mora, J. (2011). Computing a classic index for finite-horizon bandits. INFORMS J. Comput. 23, 254–267.
Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Math. Operat. Res. 24, 293–305.
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58, 527–535.
Schrijver, A. (2000). Theory of Linear and Integer Programming. John Wiley, New York.
Verloop, I. M. (2016). Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Prob. 26, 1947–1995.
Washburn, R. B. (2008). Application of multi-armed bandits to sensor management. In Foundations and Applications of Sensor Management, Springer, Boston, MA, pp. 153–175.
Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637–648.
Whittle, P. (1980). Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143–149.
Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25(A)), ed. Gani, J., Applied Probability Trust, Sheffield, pp. 287–298.
Zayas-Cabán, G., Jasin, S. and Wang, G. (2019). An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Supplementary material. Available at https://doi.org/10.1017/apr.2019.29