Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-26T05:28:14.142Z Has data issue: false hasContentIssue false

On the policy improvement algorithm for ergodic risk-sensitive control

Published online by Cambridge University Press:  02 September 2020

Ari Arapostathis
Affiliation:
Department of Electrical and Computer Engineering, The University of Texas at Austin, EER 7.824, Austin, TX78712 (ari@utexas.edu)
Anup Biswas
Affiliation:
Department of Mathematics, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune411008, India (anup@iiserpune.ac.in; somnath@iiserpune.ac.in)
Somnath Pradhan
Affiliation:
Department of Mathematics, Indian Institute of Science Education and Research, Dr. Homi Bhabha Road, Pune411008, India (anup@iiserpune.ac.in; somnath@iiserpune.ac.in)

Abstract

In this article we consider the ergodic risk-sensitive control problem for a large class of multidimensional controlled diffusions on the whole space. We study the minimization and maximization problems under either a blanket stability hypothesis, or a near-monotone assumption on the running cost. We establish the convergence of the policy improvement algorithm for these models. We also present a more general result concerning the region of attraction of the equilibrium of the algorithm.

Type
Research Article
Copyright
Copyright © The Author(s) 2020. Published by Cambridge University Press on behalf of The Royal Society of Edinburgh

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anulova, S., Mai, H. and Veretennikov, A.. On averaged expected cost control as reliability for 1d ergodic diffusions. Reliab. Theory Appl. 12 (2017), 3138.Google Scholar
Anulova, S., Mai, H. and Veretennikov, A.. Yet again on iteration improvement for averaged expected cost control for 1d ergodic diffusions. ArXiv e-prints, 1812.10665, 2018.Google Scholar
Arapostathis, A.. On the policy iteration algorithm for nondegenerate controlled diffusions under the ergodic criterion. In Optimization, control, and applications of stochastic systems, Systems Control Found. Appl., eds. Hernández-Hernández, Daniel and Minjárez-Sosa, J. Adolfo, pp. 112 (New York: Birkhäuser/Springer, 2012). doi:10.1007/978-0-8176-8337-5-1.Google Scholar
Arapostathis, A. and Biswas, A.. Infinite horizon risk-sensitive control of diffusions without any blanket stability assumptions. Stochastic Process. Appl. 128 (2018), 14851524, ISSN0304-4149. doi:10.1016/j.spa.2017.08.001.CrossRefGoogle Scholar
Arapostathis, A. and Biswas, A.. Risk-sensitive control for a class of diffusions with jumps. ArXiv e-prints, 1910.05004, 2019.Google Scholar
Arapostathis, A., Biswas, A.. A variational formula for risk-sensitive control of diffusions in $\mathbb {R}^d$. SIAM J. Control Optim. 58 (2020), 85103, ISSN . doi:10.1137/18M1218704.CrossRefGoogle Scholar
Arapostathis, A., Biswas, A., Borkar, V. S. and Suresh Kumar, K.. A variational characterization of the risk-sensitive average reward for controlled diffusions in $\mathbb {R}^d$. ArXiv e-prints, 1903.08346, 2019a.Google Scholar
Arapostathis, A., Biswas, A. and Saha, S.. Strict monotonicity of principal eigenvalues of elliptic operators in $\mathbb {R}^d$ and risk-sensitive control. J. Math. Pures Appl. 124 2019b, 169219. doi:10.1016/j.matpur.2018.05.008.CrossRefGoogle Scholar
Arapostathis, A.. Borkar, V. S. and Ghosh, M. K. Ergodic control of diffusion processes. Encyclopedia of Mathematics and its Applications vol. 143. (Cambridge: Cambridge University Press, 2012).Google Scholar
Arapostathis, A., Caffarelli, L., Pang, G. and Zheng, Y.. Ergodic control of a class of jump diffusions with finite Lévy measures and rough kernels. SIAM J. Control Optim. 57 (2019c), 15161540. ISSN . doi:10.1137/18M1166717.CrossRefGoogle Scholar
Arapostathis, A., Ghosh, M. K. and Marcus, S. I.. Harnack's inequality for cooperative weakly coupled elliptic systems. Comm. Partial Differ. Equ. 24 (1999), 15551571. doi:10.1080/03605309908821475.CrossRefGoogle Scholar
Arapostathis, A., Hmedi, H. and Pang, G.. On uniform exponential ergodicity of Markovian multiclass many-server queues in the Halfin–Whitt regime. Math. Oper. Res. (to appear), (2020).Google Scholar
Armstrong, S. N.. Principal eigenvalues and anti-maximum principle for homogeneous fully nonlinear elliptic equations. J. Diff. Equ. 246 (2009), 29582987. ISSN . doi:10.1016/j.jde.2008.10.026.CrossRefGoogle Scholar
Berestycki, H., Nirenberg, L. and Varadhan, S. R. S.. The principal eigenvalue and maximum principle for second-order elliptic operators in general domains. Comm. Pure Appl. Math. 47 (1994), 4792. ISSN . doi:10.1002/cpa.3160470105.CrossRefGoogle Scholar
Berestycki, H. and Rossi, L.. Generalizations and properties of the principal eigenvalue of elliptic operators in unbounded domains. Comm. Pure Appl. Math. 68 (2015), 10141065. doi:10.1002/cpa.21536.CrossRefGoogle Scholar
Bielecki, T. R. and Pliska, S. R.. Risk-sensitive dynamic asset management. Appl. Math. Optim. 39 (1999), 337360. ISSN . doi:10.1007/s002459900110.CrossRefGoogle Scholar
Biswas, A.. An eigenvalue approach to the risk sensitive control problem in near monotone case. Systems Control Lett. 60 (2011a), 181184. doi:10.1016/j.sysconle.2010.12.002.CrossRefGoogle Scholar
Biswas, A.. Risk sensitive control of diffusions with small running cost. Appl. Math. Optim. 64 (2011b), 112. doi:10.1007/s00245-010-9127-4.CrossRefGoogle Scholar
Borkar, V. S. and Meyn, S. P.. Risk-sensitive optimal control for Markov decision processes with monotone cost. Math. Oper. Res. 27 (2002), 192209. ISSN . doi:10.1287/moor.27.1.192.334.CrossRefGoogle Scholar
Chen, Y-Z. and Wu, L-C.. Second order elliptic equations and elliptic systems. Translations of Mathematical Monographs vol. 174. (Providence, RI: American Mathematical Society, 1998). ISBN 0-8218-0970-9. Translated from the 1991 Chinese original by Bei Hu.CrossRefGoogle Scholar
Fleming, W. H., McEneaney, W. M.. Risk-sensitive control on an infinite time horizon. SIAM J. Control Optim. 33 (1995), 18811915. ISSN . doi:10.1137/S0363012993258720.CrossRefGoogle Scholar
Fleming, W. H. and Sheu, S. J.. Risk-sensitive control and an optimal investment model. Math. Finance, 10 (2000), 197213. ISSN . doi:10.1111/1467-9965.00089. INFORMS Applied Probability Conference (Ulm, 1999).CrossRefGoogle Scholar
Gilbarg, D. and Trudinger, N. S.. Elliptic partial differential equations of second order, volume 224 of Grundlehren der Mathematischen Wissenschaften. (Berlin: Springer-Verlag, 1983), 2nd edn. doi:10.1007/978-3-642-61798-0.Google Scholar
Gyöngy, I. and Krylov, N.. Existence of strong solutions for Itô's stochastic equations via approximations. Probab. Theory Relat. Fields 105 (1996), 143158. doi:10.1007/BF01203833.CrossRefGoogle Scholar
Kaise, H. and Sheu, S-J.. On the structure of solutions of ergodic type Bellman equation related to risk-sensitive control. Ann. Probab. 34 (2006), 284320. doi:10.1214/009117905000000431.CrossRefGoogle Scholar
Krylov, N. V.. Controlled diffusion processes. Applications of Mathematics vol. 14. (New York-Berlin: Springer-Verlag, 1980). ISBN 0-387-90461-1.CrossRefGoogle Scholar
Menaldi, J-L. and Robin, M.. Remarks on risk-sensitive control problems. Appl. Math. Optim. 52 (2005), 297310. doi:10.1007/s00245-005-0829-y.CrossRefGoogle Scholar
Meyn, S. P. and Tweedie, R. L.. Computable bounds for geometric convergence rates of Markov chains. Ann. Appl. Probab. 4 (1994), 9811011.CrossRefGoogle Scholar
Nagai, H.. Bellman equations of risk-sensitive control. SIAM J. Control Optim. 34 (1996), 74101. ISSN . doi:10.1137/S0363012993255302.CrossRefGoogle Scholar
Speyer, J.. An adaptive terminal guidance scheme based on an exponential cost criterion with application to homing missile guidance. IEEE Trans. Autom. Control, 21 (1976), 371375. ISSN . doi:10.1109/TAC.1976.1101206.CrossRefGoogle Scholar
Veretennikov, A. Yu.. Strong solutions and explicit formulas for solutions of stochastic integral equations. Mat. Sb. (N.S.) 111 (1980), 434452, 480. ISSN .Google Scholar
Veretennikov, A. Yu.. Parabolic equations and stochastic equations of Itô with coefficients that are discontinuous with respect to time. Mat. Zametki, 31 (1982), 549557, 654 . ISSN .Google Scholar
Whittle, P.. Risk-sensitive optimal control. Wiley-Interscience Series in Systems and Optimization. (Chichester: John Wiley & Sons, Ltd., 1990). ISBN 0-471-92622-1.Google Scholar
Yoshimura, Y.. A note on demi-eigenvalues for uniformly elliptic Isaacs operators. Viscosity Solution Theory of Differential Equations and its Developments, pp. 106114, 2006.Google Scholar