The variance of discounted Markov decision processes

Matthew J. Sobel

doi:10.2307/3213832

The variance of discounted Markov decision processes

Published online by Cambridge University Press: 14 July 2016

Matthew J. Sobel

Show author details

Matthew J. Sobel*: Affiliation:
Georgia Institute of Technology
*: ∗ Postal address: College of Management, Georgia Institute of Technology, Atlanta, GA 30332, U.S.A.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Formulae are presented for the variance and higher moments of the present value of single-stage rewards in a finite Markov decision process. Similar formulae are exhibited for a semi-Markov decision process. There is a short discussion of the obstacles to using the variance formula in algorithms to maximize the mean minus a multiple of the standard deviation.

Keywords

MARKOV DECISION PROCESS VARIANCE DISCOUNTED RETURN POLICY IMPROVEMENT

Type: Research Papers
Information: Journal of Applied Probability , Volume 19 , Issue 4 , December 1982 , pp. 794 - 802

DOI: https://doi.org/10.2307/3213832 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1982

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Denardo, E. V. (1967) Contraction mappings in the theory underlying dynamic programming. SIAM Rev. 9, 165–177.Google Scholar

Denardo, E. V. (1971) Markov renewal programming with small interest rates. Ann. Math. Statist. 42, 477–496.CrossRef Google Scholar

Derman, C. (1970) Finite State Markovian Decision Processes. Academic Press, New York.Google Scholar

Ferejohn, J. and Page, T. (1978) On the foundations of intertemporal choice. Amer. J. Agricultural Econom. 60, 269–275.Google Scholar

Jaquette, S. C. (1973) Markov decision processes with a new optimality criterion: discrete time. Ann. Statist. 1, 496–505.CrossRef Google Scholar

Kemeny, J. G. and Snell, J. L. (1960) Finite Markov Chains. Van Nostrand, New York.Google Scholar

Kreps, D. M. and Porteus, E. L. (1978) Temporal resolution of uncertainty and dynamic choice theory. Econometrica 46, 185–200.Google Scholar

Kushner, H. (1971) Introduction to Stochastic Control. Holt, New York.Google Scholar

Mandl, P. (1971) On the variance of controlled Markov chains. Kybernetika 7, 1–12.Google Scholar

Mendelssohn, R. (1980) A systematic approach to determining mean-variance tradeoffs when managing randomly varying populations. Math. Biosci. 50, 75–84.CrossRef Google Scholar

Mine, H. and Osaki, S. (1970) Markovian Decision Processes. American Elsevier, New York.Google Scholar

Platzman, L. K. (1978) Mimeographed lecture notes for IOE 315. Dept. of Industrial and Operations Engineering, University of Michigan, Ann Arbor.Google Scholar

Sobel, M. J. (1975) Ordinal dynamic programming. Management Sci. 21, 967–975.Google Scholar

Stancu-Minasian, I. M. and Wets, M. J. (1976) A research bibliography in stochastic programming. Operat. Res. 24, 1078–1119.CrossRef Google Scholar

White, D. J. (1974) Dynamic programming and probabilistic constraints. Operat. Res. 22, 654–664.Google Scholar

Article contents

The variance of discounted Markov decision processes

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests