Skip to main content Accessibility help
×
Home

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

  • François Dufour (a1), M. Horiguchi (a2) and A. B. Piunovskiy (a3)

Abstract

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

    • Send article to Kindle

      To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

      Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

      Find out more about the Kindle Personal Document Service.

      The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach
      Available formats
      ×

      Send article to Dropbox

      To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

      The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach
      Available formats
      ×

      Send article to Google Drive

      To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

      The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach
      Available formats
      ×

Copyright

Corresponding author

Postal address: INRIA Bordeaux Sud-ouest, CQFD Team, 351 cours de la Libération, F-33400 Talence, France. Email address: dufour@math.u-bordeaux1.fr
∗∗ Postal address: Department of Mathematics, Faculty of Engineering, Kanagawa University, 3-27-1 Rokkakubashi, Kanagawa-ku, Yokohama 221-8686, Japan. Email address: horiguchi@kanagawa-u.ac.jp
∗∗∗ Postal address: Department of Mathematical Sciences, University of Liverpool, Liverpool L69 7ZL, UK. Email address: piunov@liverpool.ac.uk

References

Hide All
[1] Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.
[2] Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.
[3] Bertsekas, D. P. (1987). Dynamic Programming. Prentice Hall, Englewood Cliffs, NJ.
[4] Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control (Math. Sci. Eng. 139). Academic Press, New York.
[5] Borkar, V. S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 347375.
[6] Dufour, F. and Piunovskiy, A. B. (2010). Multiobjective stopping problem for discrete-time Markov processes: convex analytic approach. J. Appl. Prob. 47, 947966.
[7] Feinberg, E. A. (2002). Total reward criteria. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. 40), Kluwer, Boston, MA, pp. 173207.
[8] Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes (Appl. Math. 30). Springer, New York.
[9] Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes (Appl. Math. 42). Springer, New York.
[10] Horiguchi, M. (2001). Markov decision processes with a stopping time constraint. Math. Meth. Operat. Res. 53, 279295.
[11] Horiguchi, M. (2001). Stopped Markov decision processes with multiple constraints. Math. Meth. Operat. Res. 54, 455469.
[12] Luenberger, D. G. and Ye, Y. (2010). Linear and Nonlinear Programming (Internat. Ser. Operat. Res. Manag. Sci. 116), 3rd edn. Springer, New York.
[13] Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints (Math. Appl. 410). Kluwer, Dordrecht.
[14] Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.
[15] Rockafellar, R. T. (1970). Convex Analysis (Princeton Math. Ser. 28). Princeton University Press.
[16] Schäl, M. (1975). On dynamic programming: compactness of the space of policies. Stoch. Process. Appl. 3, 345364.

Keywords

MSC classification

Related content

Powered by UNSILO

The Expected Total Cost Criterion for Markov Decision Processes under Constraints: A Convex Analytic Approach

  • François Dufour (a1), M. Horiguchi (a2) and A. B. Piunovskiy (a3)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.