Skip to main content Accessibility help
×
Home

Computing Optimal Policies for Markovian Decision Processes Using Simulation

  • Apostolos N. Burnetas (a1) and Michael N. Katehakis (a2)

Abstract

A simulation method is developed for computing average reward optimal policies, for a finite state and action Markovian decision process. It is shown that the method is consistent; i.e., it produces solutions arbitrarily close to the optimal. Various types of estimation errors and confidence bounds are examined. Finally, it is shown that the probability distribution of the number of simulation cycles required to compute an e-optimal policy satisfies a large deviations property.

Copyright

References

Hide All
1.Agrawal, R., Teneketzis, D., & Anantharam, V. (1989). Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Transactions on Automated Control 34: 12491259.
2.Burnetas, A.N. & Katehakis, M.N. (1992). On power one estimation from simulation of finite Markov chains. Technical Report 711-0003, Rutgers University, New Brunswick, NJ.
3.Burnetas, A.N. & Katehakis, M.N. (1994). Optimal adaptive policies for dynamic programming. Technical Report, Rutgers University, New Brunswick, NJ.
4.Crane, M.A. & Iglehart, D.L. (1974). Simulating stable stochastic systems, i: General multiserver queues. Journal of the Association for Computing Machines 21: 103113.
5.Crane, M.A. & Iglehart, D.L. (1974). Simulating stable stochastic systems, ii: Markov chains. Journal of the Association for Computing Machines 21: 114123.
6.Dembo, A. & Zeitouni, O. (1993). Large deviations techniques and applications. Jones and Bartlett.
7.Derman, C. (1970). Finite state Markovian decision processes. New York: Academic Press.
8.Ellis, R.S. (1985). Entropy, large deviations and statistical mechanics. New York: Springer-Verlag.
9.Federgruen, A. & Schweitzer, P. (1981). Nonstationary Markov decision problems with converging parameters. Journal of Optimization Theory Applications 34: 207241.
10.Hernández-Lerma, O. (1989). Adaptive Markov control processes. New York: Springer-Verlag.
11.Kumar, P.R. (1985). A survey of some results in stochastic adaptive control. SIAM Journal on Control and Optimization 23: 329380.
12.Ross, S.M. (1983). Introduction to stochastic dynamic programming. New York: Academic Press.
13.Ross, S.M. & Schechner, Z. (1985). Using simulation to estimate first passage distribution. Management Science 31(2): 224234.
14.Thomas, L.C., Harley, R. & Lavercombe, A.C. (1983). Computational comparisons of value iteration algorithms for discounted Markov decision processes. Operations Research Letters 2: 7276.
15.Van-Dijk, N.M. & Puterman, M.L. (1988). Perturbation theory for Markov reward processes with applications to queueing systems. Advances in Applied Probability 20: 7998.

Related content

Powered by UNSILO

Computing Optimal Policies for Markovian Decision Processes Using Simulation

  • Apostolos N. Burnetas (a1) and Michael N. Katehakis (a2)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.