Cost rate heuristics for semi-Markov decision processes

K. D. Glazebrook; Michael P. Bailey; Lyn R. Whitaker

doi:10.2307/3214900

Cost rate heuristics for semi-Markov decision processes

Published online by Cambridge University Press: 14 July 2016

K. D. Glazebrook ,

Michael P. Bailey and

Lyn R. Whitaker

Show author details

K. D. Glazebrook*: Affiliation:
University of Newcastle upon Tyne
Michael P. Bailey*: Affiliation:
Naval Postgraduate School, Monterey
Lyn R. Whitaker*: Affiliation:
Naval Postgraduate School, Monterey
*: ∗Postal address: Department of Mathematics and Statistics, The University, Newcastle upon Tyne NE1 7RU, UK.
∗∗Postal address: Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943, USA.
∗∗Postal address: Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943, USA.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

In response to the computational complexity of the dynamic programming/backwards induction approach to the development of optimal policies for semi-Markov decision processes, we propose a class of heuristics resulting from an inductive process which proceeds forwards in time. These heuristics always choose actions in such a way as to minimize some measure of the current cost rate. We describe a procedure for calculating such cost rate heuristics. The quality of the performance of such policies is related to the speed of evolution (in a cost sense) of the process. A simple model of preventive maintenance is described in detail. Cost rate heuristics for this problem are calculated and assessed computationally.

Keywords

DYNAMIC PROGRAMMING REPLACEMENT POLICY

MSC classification

Primary: 90C39: Dynamic programming

Type: Research Papers
Information: Journal of Applied Probability , Volume 29 , Issue 3 , September 1992 , pp. 633 - 644

DOI: https://doi.org/10.2307/3214900 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1992

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Research supported by the National Research Council by means of a Senior Research Associateship at the Department of Operations Research, Naval Postgraduate School, Monterey, California.

Dr Bailey was supported by the Naval Weapons Support Centre, Crane, IN, and Dr Whitaker by the Naval Postgraduate School Research Foundation.

References

Albright, S. C. (1978) Structural results for partially observable Markov decision processes. Operat. Res. 27, 1041–1053.Google Scholar

Aras, G. and Whitaker, L. R. (1990) Sequential Nonparametric Estimation of an Age Replacement Policy. Technical Report, Department of Operations Research, Naval Postgraduate School, Monterey, California.Google Scholar

Aven, T. (1983) Optimal replacement under a minimal repair strategy - a general failure model. Adv. Appl. Prob. 15, 198–211.Google Scholar

Aven, T. and Bergman, B. (1986) Optimal replacement times - a general setup. J. Appl. Prob. 23, 432–442.Google Scholar

Bather, J. A. (1977) On the sequential construction of an optimal age replacement policy. Bull. Inst. Internat. Statist. 47, 253–266.Google Scholar

Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist. 36, 226–235.Google Scholar

Chen, C. S. and Savits, T. H. (1988) A discounted cost relationship. J. Multivariate Anal. 27, 105–115.Google Scholar

Frees, E. and Ruppert, D. (1985) Sequential nonparametric age replacement policies. Ann. Statist. 13, 650–662.CrossRef Google Scholar

Gittins, J. C. (1989) Multi-Armed Bandit Allocation Indices. Wiley, Chichester.Google Scholar

Glazebrook, K. D. (1991) Strategy evaluation for stochastic scheduling problems with order constraints. Adv. Appl. Prob. 23, 86–104.Google Scholar

Howard, R. (1960) Dynamic Programming and Markov Processes. MIT Press, Cambridge, MA.Google Scholar

Katehakis, M. N. and Veinott, A. F. (1987) The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262–268.Google Scholar

Porteus, E. (1980) Overview of iterative methods for discounted finite Markov and semi-Markov decision chains. In Recent Developments in Markov Decision Processes, ed. Hartley, R., Thomas, L. C. and White, D. J., 1–20.Google Scholar

Ross, S. M. (1970) Applied Probability Models with Optimization Applications. Holden-Day, San Francisco.Google Scholar

White, C. C. (1979) Bounds on optimal cost for a replacement problem with partial observations. Nav. Res. Logist. Quart. 26, 415–422.Google Scholar

Article contents

Cost rate heuristics for semi-Markov decision processes

Abstract

Keywords

MSC classification

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests