Skip to main content Accessibility help
×
Home

OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

  • Dinard van der Laan (a1)

Abstract

In this article we study Markov decision process (MDP) problems with the restriction that at decision epochs, only a finite number of given Markov decision rules are admissible. For example, the set of admissible Markov decision rules could consist of some easy-implementable decision rules. Additionally, many open-loop control problems can be modeled as an MDP with such a restriction on the admissible decision rules. Within the class of available policies, optimal policies are generally nonstationary and it is difficult to prove that some policy is optimal. We give an example with two admissible decision rules—={d1, d2} —for which we conjecture that the nonstationary periodic Markov policy determined by its period cycle (d1, d1, d2, d1, d2, d1, d2, d1, d2) is optimal. This conjecture is supported by results that we obtain on the structure of optimal Markov policies in general. We also present some numerical results that give additional confirmation for the conjecture for the particular example we consider.

Copyright

References

Hide All
1.Altman, B., Gaujal, E. & Hordijk, A. (2003). Discrete-event control of stochastic networks: Multimodularity and regularity. Lecture Notes in Mathematics. New York: Springer Verlag.
2.Altman, E., Gaujal, B. & Hordijk, A. (2000). Balanced sequences and optimal routing. Journal of the ACM 47: 752775.
3.Altman, E., Gaujal, B. & Hordijk, A. (2000). Multimodularity, convexity and optimization properties. Mathematics of Operations Research 25: 324347.
4.Altman, E., Gaujal, B., Hordijk, A. & Koole, G. (1998) Optimal admission, routing and service assignment control: the case of single buffer queues. In the 37th IEEE Conference on Decision and Control, Tampa, FL, Vol. 2, pp. 21192124.
5.Altman, E. & Shwartz, A. (1991). Markov decision problems and state-action frequencies. SIAM Journal on Control and Optimization 29: 786809.
6.Bhulai, S., Farenhorst-Yuan, T., Heidergott, B. & van der Laan, D.A. (2010). Optimal balanced control for call centers. Technical report, Tinbergen Institute.
7.Cao, X.R. (1998). The MacLaurin series for performance functions of Markov chains. Advances in Applied Probability 30: 676692.
8.Fernández-Gaucherand, E., Araposthathis, A. & Marcus, S.I. (1991). On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes. Annals of Operations Research 29: 439470.
9.Fernández-Gaucherand, E., Araposthathis, A. & Marcus, S.I. (1991). Remarks on the existence of solutions to the average cost optimality equation in Markov decision processes. Systems and Control Letters 15: 425432.
10.Gaujal, B., Hordijk, A. & van der Laan, D.A. (2007). On the optimal policy for deterministic and exponential polling systems. Probability in the Engineering and Informational Sciences 21: 157187.
11.Hajek, B. (1985). Extremal splittings of point processes. Mathematics of Operations Research 10(4): 543556.
12.Heidergott, B. & Hordijk, A. (2003). Taylor series expansions for stationary Markov chains. Advances in Applied Probability 35: 10461070.
13.Heidergott, B. & Vázquez-Abad, F. (2008). Measure valued differentiation for Markov chains. Journal of Optimization and Applications 136: 187209.
14.Heidergott, B., Vázquez-Abad, F.J., Pflug, G. & Farenhorst-Yuan, T. (2010). Gradient estimation for discrete-event systems by measure-valued differentiation. ACM Transactions on Modeling and Computer Simulation (TOMACS) 20: 128.
15.Hernández-Lerma, O. & Lasserre, J.B. (1996). Discrete-time Markov control processes: Basic optimality criteria. New York: Springer.
16.Hordijk, A. & van der Laan, D.A. (2005). On the average waiting time for regular routing to deterministic queues. Mathematics of Operations Research 30: 521544.
17.Koole, G. (1999). On the static assignment to parallel servers. IEEE Transactions on Automatic Control 44: 15881592.
18.Lothaire, M. (2002). Algebraic combinatorics on words. Cambridge: Cambridge University Press.
19.MacPhee, I.M. & Jordan, B.P. (1995). Optimal search for a moving target. Probability in the Engineering and Informational Sciences 9: 159182.
20.Morse, M. & Hedlund, G.A. (1940). Symbolic dynamics II — sturmian trajectories. American Journal of Mathematics 62: 142.
21.Pflug, G.C. (1996). Optimization of stochastic models. Amsterdam: Kluwer Academic.
22.Puterman, M. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: John Wiley and Sons.
23.Ross, K.W. (1989). Randomized and past-dependent policies for Markov decision processes with multiple constraints. Operations Research 37: 474477.
24.Ross, S.M. (1983). Introduction to stochastic dynamic programming. New York: Academic Press.
25.Tijdeman, R. (2000). Fraenkel's conjecture for six sequences. Discrete Mathematics 222: 223234.

Related content

Powered by UNSILO

OPTIMAL MIXING OF MARKOV DECISION RULES FOR MDP CONTROL

  • Dinard van der Laan (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.