First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

Xiao Wu; Xianping Guo

doi:10.1239/jap/1437658608

First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

Part of: Stochastic systems and control Markov processes

Published online by Cambridge University Press: 30 January 2018

Xiao Wu and

Xianping Guo

Show author details

Xiao Wu*: Affiliation:
Sun Yat-Sen University
Xianping Guo*: Affiliation:
Sun Yat-Sen University
*: ∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China.
∗ Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China.

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper deals with the first passage optimality and variance minimisation problems of discrete-time Markov decision processes (MDPs) with varying discount factors and unbounded rewards/costs. First, under suitable conditions slightly weaker than those in the previous literature on the standard (infinite horizon) discounted MDPs, we establish the existence and characterisation of the first passage expected-optimal stationary policies. Second, to further distinguish the expected-optimal stationary policies, we introduce the variance minimisation problem, prove that it is equivalent to a new first passage optimality problem of MDPs, and, thus, show the existence of a variance-optimal policy that minimises the variance over the set of all first passage expected-optimal stationary policies. Finally, we use a computable example to illustrate our main results and also to show the difference between the first passage optimality here and the standard discount optimality of MDPs in the previous literature.

Keywords

Discrete-time Markov decision process varying discount factor unbounded reward first passage optimality variance minimisation

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 93E20: Optimal stochastic control 60J27: Continuous-time Markov processes on discrete state spaces

Type: Research Article
Information: Journal of Applied Probability , Volume 52 , Issue 2 , June 2015 , pp. 441 - 456

DOI: https://doi.org/10.1239/jap/1437658608 [Opens in a new window]
Copyright: © Applied Probability Trust

References

Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications in Finance. Springer, Heidelberg.CrossRef Google Scholar

Derman, C. (1970). Finite State Markovian Decision Processes (Math. Sci. Eng. 67). Academic Press, New York.Google Scholar

Feinberg, E. A. and Shwartz, A. (1994). Markov decision models with weighted discounted criteria. Math. Operat. Res. 19, 152–168.CrossRef Google Scholar

González-Hernández, J., López-Martı´nez, R. R. and Minjárez-Sosa, J. A. (2008). Adaptive policies for stochastic systems under a randomized discounted cost criterion. Bol. Soc. Mat. Mexicana (3) 14, 149–163.Google Scholar

Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.CrossRef Google Scholar

Guo, X. and Song, X. (2009). Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54, 2151–2157.Google Scholar

Guo, X., Hernández-del-Valle, A. and Hernández-Lerma, O. (2012). First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18, 528–538.Google Scholar

Guo, X., Ye, L. and Yin, G. (2012). A mean-variance optimization problem for discounted Markov decision processes. Europ. J. Operat. Res. 220, 423–429.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 79–93.Google Scholar

Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421–448.Google Scholar

Huang, Y. and Guo, X. (2009). Optimal risk probability for first passage models in semi- Markov decision processes. J. Math. Anal. Appl. 359, 404–420.Google Scholar

Huang, Y.-H. and Guo, X.-P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica (English Ser.) 27, 177–190.Google Scholar

Kurano, M. (1987). Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123, 572–583.Google Scholar

Liu, J. and Huang, S. (2001). Markov decision processes with distribution function criterion of first-passage time. Appl. Math. Optimization 43, 187–201.Google Scholar

Liu, J. Y. and Liu, K. (1992). Markov decision programming—the first passage model with denumerable state space. Systems Sci. Math. Sci. 5, 340–351.Google Scholar

Mamabolo, R. M. and Beichelt, F. E. (2004). Maintenance policies with minimal repair. Econ. Qual. Control 19, 143–166.Google Scholar

Prieto-Rumeau, T. and Hernández-Lerma, O. (2009). Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Meth. Operat. Res. 70, 527–540.CrossRef Google Scholar

Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.Google Scholar

Schäl, M. (2005). Control of ruin probabilities by discrete-time investments. Math. Meth. Operat. Res. 62, 141–158.Google Scholar

Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794–802.Google Scholar

Wei, Q. and Guo, X. (2011). Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Operat. Res. Lett. 39, 369–374.Google Scholar

Yu, S. X., Lin, Y. and Yan, P. (1998). Optimization models for the first arrival target distribution function in discrete time. J. Math. Analysis Appl. 225, 193–223.Google Scholar

Article contents

First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests