Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Yi Zhang

doi:10.1239/jap/1421763321

Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Part of: Markov processes

Published online by Cambridge University Press: 30 January 2018

Yi Zhang

Show author details

Yi Zhang*: Affiliation:
University of Liverpool
*: ∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Email address: yi.zhang@liv.ac.uk

Article contents

Abstract
References

Rights & Permissions

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

This paper considers the average optimality for a continuous-time Markov decision process in Borel state and action spaces, and with an arbitrarily unbounded nonnegative cost rate. The existence of a deterministic stationary optimal policy is proved under the conditions that allow the following; the controlled process can be explosive, the transition rates are weakly continuous, and the multifunction defining the admissible action spaces can be neither compact-valued nor upper semicontinuous.

Keywords

Continuous-time Markov decision process average optimality weak continuity

MSC classification

Primary: 90C40: Markov and semi-Markov decision processes

Secondary: 60J25: Continuous-time Markov processes on general state spaces

Type: Research Article
Information: Journal of Applied Probability , Volume 51 , Issue 4 , December 2014 , pp. 954 - 970

DOI: https://doi.org/10.1239/jap/1421763321 [Opens in a new window]
Copyright: © Applied Probability Trust

References

Berberian, S. K. (1999). Fundamentals of Real Analysis. Springer, New York.Google Scholar

Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. Academic Press, New York.Google Scholar

Cavazos-Cadena, R. (1991). A counterexample on the optimality equation in Markov decision chains with the average cost criterion. Syst. Control Lett. 16, 387–392.CrossRef Google Scholar

Cavazos-Cadena, R. and Salem-Silva, F. (2010). The discunted method and equivalence of average criteria for risk-sensitive Markov decision processes on Borel spaces. Appl. Math. Optimization 61, 167–190.Google Scholar

Costa, O. L. V. and Dufour, F. (2012). Average control of Markov decision processes with Feller transition probabilities and general acton spaces. J. Math. Anal. Appl. 396, 58–69.Google Scholar

Feinberg, E. A. (2012). Reduction of discounted continuous-time MDPs with unbounded Jump and reward rates to discrete-time total-reward MDPs. In Optimization, Control, and Applications of Stochastic Systems, Birkhäuser, New York, pp. 77–97.CrossRef Google Scholar

Feinberg, E. A. and Lewis, M. E. (2007). Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem. Math. Operat. Res. 32, 769–783.Google Scholar

Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2012). Average cost Markov decision processes with weakly continuous transition probabilities. Math. Operat. Res. 37, 591–607.CrossRef Google Scholar

Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2013). Berge's theorem for noncompact image sets. J. Math. Anal. Appl. 397, 255–259.Google Scholar

Feinberg, E. A., Kasyanov, P. O. and Zadoianchuk, N. V. (2013). Fatou's lemma for weakly converging probabilities. Preprint, Department of Applied Mathematics and Statistics, State University of New York at Stony Brook. Available at http://arxiv.org/abs/1206.4073v2.Google Scholar

Feinberg, E. A., Mandava, M. and Shiryaev, A. N. (2014). On solutions of Kolmogorov's equations for nonhomogeneous Jump Markov processes. J. Math. Anal. Appl. 411, 261–270.Google Scholar

Gíhman, I. Ī. and Skorohod, A. V. (1975). The Theory of Stochastic Processes. II. Springer, New York.Google Scholar

Guo, X. (2007). Continuous-time Markov decision processes with discounted rewards: the case of Polish spaces. Math. Operat. Res. 32, 73–87.CrossRef Google Scholar

Guo, X. and Hernández-Lerma, O. (2003). Drift and monotonicity conditions for continuous-time controlled Markov chains with an average criterion. IEEE Trans. Automatic Control 48, 236–245.Google Scholar

Guo, X. and Hernández-Lerma, O. (2009). Continuous-Time Markov Decision Processes: Theory and Applications. Springer, Berlin.CrossRef Google Scholar

Guo, X. and Liu, K. (2001). A note on optimality conditions for continuous-time Markov decision processes with average cost criterion. IEEE Trans. Automatic Control 46, 1984–1984.Google Scholar

Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730–756.CrossRef Google Scholar

Guo, X. and Ye, L. (2010). New discount and average optimality conditions for continuous-time Markov decision processes. Adv. Appl. Prob. 42, 953–985.CrossRef Google Scholar

Guo, X. and Zhang, Y. (2013). Generalized discounted continuous-time Markov decision processes. Preprint. Available at http://arxiv.org/abs/1304.3314.Google Scholar

Guo, X., Hernández-Lerma, O. and Prieto-Rumeau, T. (2006). A survey of recent results on continuous-time Markov decision processes. Top 14, 177–261.Google Scholar

Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 23–47.CrossRef Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes. Springer, New York.Google Scholar

Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137–146.Google Scholar

Jaśkiewicz, A. (2009). Zero-sum ergodic semi-Markov games with weakly continuous transition probabilities. J. Optimization Theory Appl. 141, 321–347.Google Scholar

Jaśkiewicz, A. and Nowak, A. S. (2006). On the optimality equation for average cost Markov control processes with Feller transition probabilities. J. Math. Anal. Appl. 316, 495–509.Google Scholar

Jaśkiewicz, A. and Nowak, A. S. (2006). Optimality in Feller semi-Markov control processes. Operat. Res. Lett. 34, 713–718.CrossRef Google Scholar

Kitaev, M. Yu. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC, Boca Raton, FL.Google Scholar

Kitayev, M. Yu. (1986). Semi-Markov and Jump Markov controlled models: average cost criterion. Theory. Prob. Appl. 30, 272–288.Google Scholar

Kuznetsov, S. E. (1981). Any Markov process in a Borel space has a transition function. Theory. Prob. Appl. 25, 384–388.CrossRef Google Scholar

Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 2032–2061.CrossRef Google Scholar

Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optimization Theory Appl. 154, 691–712.CrossRef Google Scholar

Prieto-Rumeau, T. and Hernández-Lerma, O. (2012). Selected Topics on Continuous-time Controlled Markov Chains and Markov Games. Imperial College Press, London.Google Scholar

Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.Google Scholar

Zhu, Q. (2008). Average optimality for continuous-time Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691–704.Google Scholar

Article contents

Average Optimality for Continuous-Time Markov Decision Processes Under Weak Continuity Conditions

Abstract

Keywords

MSC classification

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests