Semi-Markov decision processes with polynomial reward

Zvi Rosberg

doi:10.2307/3213482

Semi-Markov decision processes with polynomial reward

Published online by Cambridge University Press: 14 July 2016

Zvi Rosberg

Show author details

Zvi Rosberg*: Affiliation:
Technion — Israel Institute of Technology
*: ∗ Postal address: Department of Computer Science, Technion — Israel Institute of Technology, Haifa 32000, Israel.

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

A semi-Markov decision process, with a denumerable multidimensional state space, is considered. At any given state only a finite number of actions can be taken to control the process. The immediate reward earned in one transition period is merely assumed to be bounded by a polynomial and a bound is imposed on a weighted moment of the next state reached in one transition. It is shown that under an ergodicity assumption there is a stationary optimal policy for the long-run average reward criterion. A queueing network scheduling problem, for which previous criteria are inapplicable, is given as an application.

Keywords

DYNAMIC PROGRAMMING OPTIMAL STATIONARY POLICY UNBOUNDED REWARD AVERAGE COST QUEUEING NETWORK

Type: Research Papers
Information: Journal of Applied Probability , Volume 19 , Issue 2 , June 1982 , pp. 301 - 309

DOI: https://doi.org/10.2307/3213482 [Opens in a new window]
Copyright: Copyright © Applied Probability Trust 1982

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

This study was initiated in the course of the author's visit to the Department of Business Administration of the University of Illinois, and concluded with the support of the National Science Foundation under Grant ENG-7903879A01 in the course of his visit to Department EECS of the University of California at Berkeley.

References

[1] Federgruen, A., Hordijk, A. and Tijms, H. C. (1979) Denumerable state semi-Markov decision processes with unbounded costs, average cost criterion. Stoch. Proc. Appl. 9, 223–235.Google Scholar

[2] Fisher, L. (1968) On recurrent denumerable decision processes. Ann. Math. Statist. 39, 424–434.Google Scholar

[3] Lippman, S. A. (1973) Semi-Markov decision processes with unbounded reward. Management Sci. 19, 717–731.Google Scholar

[4] Lippman, S. A. (1975) On dynamic programming with unbounded rewards. Management Sci. 21, 1225–1233.Google Scholar

[5] Nenen, J. V. and Wessels, J. (1979) Successive approximations for Markov decision processes and Markov games with unbounded rewards. Math. Operationsforsch. Statist. Ser. Optimization 19, 431–455.CrossRef Google Scholar

[6] Rosberg, Z. (1980) A positive recurrence criterion associated with multi-dimensional queueing networks. J. Appl. Prob. 17, 790–801.CrossRef Google Scholar

[7] Rosberg, Z., Varaiya, P. and Walrand, J. (1982) Optimal control of service in tandem queues. IEEE Trans. Automatic Control AC-27.Google Scholar

[8] Ross, S. M. (1970) Average cost semi-Markov decision processes. J. Appl. Prob. 7, 649–659.Google Scholar

Article contents

Semi-Markov decision processes with polynomial reward

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests