Hostname: page-component-7479d7b7d-qlrfm Total loading time: 0 Render date: 2024-07-10T21:29:01.082Z Has data issue: false hasContentIssue false

Cost rate heuristics for semi-Markov decision processes

Published online by Cambridge University Press:  14 July 2016

K. D. Glazebrook*
Affiliation:
University of Newcastle upon Tyne
Michael P. Bailey*
Affiliation:
Naval Postgraduate School, Monterey
Lyn R. Whitaker*
Affiliation:
Naval Postgraduate School, Monterey
*
Postal address: Department of Mathematics and Statistics, The University, Newcastle upon Tyne NE1 7RU, UK.
∗∗Postal address: Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943, USA.
∗∗Postal address: Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943, USA.

Abstract

In response to the computational complexity of the dynamic programming/backwards induction approach to the development of optimal policies for semi-Markov decision processes, we propose a class of heuristics resulting from an inductive process which proceeds forwards in time. These heuristics always choose actions in such a way as to minimize some measure of the current cost rate. We describe a procedure for calculating such cost rate heuristics. The quality of the performance of such policies is related to the speed of evolution (in a cost sense) of the process. A simple model of preventive maintenance is described in detail. Cost rate heuristics for this problem are calculated and assessed computationally.

MSC classification

Type
Research Papers
Copyright
Copyright © Applied Probability Trust 1992 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by the National Research Council by means of a Senior Research Associateship at the Department of Operations Research, Naval Postgraduate School, Monterey, California.

Dr Bailey was supported by the Naval Weapons Support Centre, Crane, IN, and Dr Whitaker by the Naval Postgraduate School Research Foundation.

References

Albright, S. C. (1978) Structural results for partially observable Markov decision processes. Operat. Res. 27, 10411053.Google Scholar
Aras, G. and Whitaker, L. R. (1990) Sequential Nonparametric Estimation of an Age Replacement Policy. Technical Report, Department of Operations Research, Naval Postgraduate School, Monterey, California.Google Scholar
Aven, T. (1983) Optimal replacement under a minimal repair strategy - a general failure model. Adv. Appl. Prob. 15, 198211.Google Scholar
Aven, T. and Bergman, B. (1986) Optimal replacement times - a general setup. J. Appl. Prob. 23, 432442.Google Scholar
Bather, J. A. (1977) On the sequential construction of an optimal age replacement policy. Bull. Inst. Internat. Statist. 47, 253266.Google Scholar
Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist. 36, 226235.Google Scholar
Chen, C. S. and Savits, T. H. (1988) A discounted cost relationship. J. Multivariate Anal. 27, 105115.Google Scholar
Frees, E. and Ruppert, D. (1985) Sequential nonparametric age replacement policies. Ann. Statist. 13, 650662.CrossRefGoogle Scholar
Gittins, J. C. (1989) Multi-Armed Bandit Allocation Indices. Wiley, Chichester.Google Scholar
Glazebrook, K. D. (1991) Strategy evaluation for stochastic scheduling problems with order constraints. Adv. Appl. Prob. 23, 86104.Google Scholar
Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar
Katehakis, M. N. and Veinott, A. F. (1987) The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262268.Google Scholar
Porteus, E. (1980) Overview of iterative methods for discounted finite Markov and semi-Markov decision chains. In Recent Developments in Markov Decision Processes, ed. Hartley, R., Thomas, L. C. and White, D. J., 120.Google Scholar
Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar
White, C. C. (1979) Bounds on optimal cost for a replacement problem with partial observations. Nav. Res. Logist. Quart. 26, 415422.Google Scholar