Skip to main content Accessibility help
×
Home

First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

  • Xiao Wu (a1) and Xianping Guo (a1)

Abstract

This paper deals with the first passage optimality and variance minimisation problems of discrete-time Markov decision processes (MDPs) with varying discount factors and unbounded rewards/costs. First, under suitable conditions slightly weaker than those in the previous literature on the standard (infinite horizon) discounted MDPs, we establish the existence and characterisation of the first passage expected-optimal stationary policies. Second, to further distinguish the expected-optimal stationary policies, we introduce the variance minimisation problem, prove that it is equivalent to a new first passage optimality problem of MDPs, and, thus, show the existence of a variance-optimal policy that minimises the variance over the set of all first passage expected-optimal stationary policies. Finally, we use a computable example to illustrate our main results and also to show the difference between the first passage optimality here and the standard discount optimality of MDPs in the previous literature.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors
      Available formats
      ×

Copyright

Corresponding author

Postal address: School of Mathematics and Computational Science, Sun Yat-Sen University, Guangzhou, P. R. China.
∗∗ Email address: jxwuxiao@126.com
∗∗∗ Email address: mcsgxp@mail.sysu.edu.cn

References

Hide All
[1] Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications in Finance. Springer, Heidelberg.
[2] Derman, C. (1970). Finite State Markovian Decision Processes (Math. Sci. Eng. 67). Academic Press, New York.
[3] Feinberg, E. A. and Shwartz, A. (1994). Markov decision models with weighted discounted criteria. Math. Operat. Res. 19, 152168.
[4] González-Hernández, J., López-Martı´nez, R. R. and Minjárez-Sosa, J. A. (2008). Adaptive policies for stochastic systems under a randomized discounted cost criterion. Bol. Soc. Mat. Mexicana (3) 14, 149163.
[5] Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes. Springer, Berlin.
[6] Guo, X. and Song, X. (2009). Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54, 21512157.
[7] Guo, X., Hernández-del-Valle, A. and Hernández-Lerma, O. (2012). First passage problems for nonstationary discrete-time stochastic control systems. Europ. J. Control 18, 528538.
[8] Guo, X., Ye, L. and Yin, G. (2012). A mean-variance optimization problem for discounted Markov decision processes. Europ. J. Operat. Res. 220, 423429.
[9] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.
[10] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.
[11] Hernández-Lerma, O., Vega-Amaya, O. and Carrasco, G. (1999). Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optimization 38, 7993.
[12] Hordijk, A. and Yushkevich, A. A. (1999). Blackwell optimality in the class of all policies in Markov decision chains with a Borel state space and unbounded rewards. Math. Meth. Operat. Res. 50, 421448.
[13] Huang, Y. and Guo, X. (2009). Optimal risk probability for first passage models in semi- Markov decision processes. J. Math. Anal. Appl. 359, 404420.
[14] Huang, Y.-H. and Guo, X.-P. (2011). First passage models for denumerable semi-Markov decision processes with nonnegative discounted costs. Acta. Math. Appl. Sinica (English Ser.) 27, 177190.
[15] Kurano, M. (1987). Markov decision processes with a minimum-variance criterion. J. Math. Anal. Appl. 123, 572583.
[16] Liu, J. and Huang, S. (2001). Markov decision processes with distribution function criterion of first-passage time. Appl. Math. Optimization 43, 187201.
[17] Liu, J. Y. and Liu, K. (1992). Markov decision programming—the first passage model with denumerable state space. Systems Sci. Math. Sci. 5, 340351.
[18] Mamabolo, R. M. and Beichelt, F. E. (2004). Maintenance policies with minimal repair. Econ. Qual. Control 19, 143166.
[19] Prieto-Rumeau, T. and Hernández-Lerma, O. (2009). Variance minimization and the overtaking optimality approach to continuous-time controlled Markov chains. Math. Meth. Operat. Res. 70, 527540.
[20] Puterman, M. L. (1994). Markov Decision Processes. John Wiley, New York.
[21] Schäl, M. (2005). Control of ruin probabilities by discrete-time investments. Math. Meth. Operat. Res. 62, 141158.
[22] Sobel, M. J. (1982). The variance of discounted Markov decision processes. J. Appl. Prob. 19, 794802.
[23] Wei, Q. and Guo, X. (2011). Markov decision processes with state-dependent discount factors and unbounded rewards/costs. Operat. Res. Lett. 39, 369374.
[24] Yu, S. X., Lin, Y. and Yan, P. (1998). Optimization models for the first arrival target distribution function in discrete time. J. Math. Analysis Appl. 225, 193223.

Keywords

MSC classification

Related content

Powered by UNSILO

First Passage Optimality and Variance Minimisation of Markov Decision Processes with Varying Discount Factors

  • Xiao Wu (a1) and Xianping Guo (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.