Strong consistency of a modified maximum likelihood estimator for controlled Markov chains

Bharat Doshi; Steven E. Shreve

doi:10.2307/3212966

Abstract

A controlled Markov chain with finite state space has transition probabilities which depend on an unknown parameter α lying in a known finite set A. For each α, a stationary control law ϕ α is given. This paper develops a control scheme whereby at each stage t a parameter α t is chosen at random from among those parameters which nearly maximize the log likelihood function, and the control ut is chosen according to the control law ϕ αt. It is proved that this algorithm leads to identification of the true α under conditions weaker than any previously considered.

Footnotes

Research sponsored in part by the Air Force Office of Scientific Research (AFSC), United States Air Force, under Contract F-49620–79-C-0165.

References

[1] Borkar, V. and Varaiya, P. (1979) Adaptive control of Markov chains, I: Finite parameter set. IEEE Trans. Auto. Control 24, 953–957.Google Scholar

[2] Loève, M. (1960) Probability Theory Van Nostrand, Princeton, NJ.Google Scholar

[3] Mandl, P. (1974) Estimation and control in Markov chains. Adv. Appl. Prob. 6, 40–60.CrossRef Google Scholar

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Doshi, Bharat T. 1981. Adaptive control of a production-inventory system. Journal of Applied Probability, Vol. 18, Issue. 01, p. 204.

El-Fattah, Yousri M. 1981. Recursive Algorithms for Adaptive Control of Finite Markov Chains. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 11, Issue. 2, p. 135.

El-Fattah, Yousri M. 1981. Gradient approach for recursive estimation and control in finite Markov chains. Advances in Applied Probability, Vol. 13, Issue. 04, p. 778.

Schäl, Manfred 1982. DGOR. p. 415.

Kumar, P. and Becker, A. 1982. A new family of optimal adaptive controllers for Markov chains. IEEE Transactions on Automatic Control, Vol. 27, Issue. 1, p. 137.

Kumar, P. and Woei Lin 1982. Optimal adaptive controllers for unknown Markov chains. IEEE Transactions on Automatic Control, Vol. 27, Issue. 4, p. 765.

Borkar, Vivek and Varaiya, Pravin 1982. Identification and Adaptive Control of Markov Chains. SIAM Journal on Control and Optimization, Vol. 20, Issue. 4, p. 470.

Varaiya, P. 1982. Theory and Application of Digital Control. p. 89.

Kolonko, Michael 1982. Strongly consistent estimation in a controlled Markov renewal model. Journal of Applied Probability, Vol. 19, Issue. 03, p. 532.

Kolonko, M. 1983. Bounds for the regret loss in dynamic programming under adaptive control. Zeitschrift für Operations Research, Vol. 27, Issue. 1, p. 17.

Caines, P. and Lafortune, S. 1984. Adaptive control with recursive identification for stochastic linear systems. IEEE Transactions on Automatic Control, Vol. 29, Issue. 4, p. 312.

Kumar, P. R. 1985. A Survey of Some Results in Stochastic Adaptive Control. SIAM Journal on Control and Optimization, Vol. 23, Issue. 3, p. 329.

Milito, R. and Cruz, J. 1987. An optimization-oriented approach to the adaptive control of Markov chains. IEEE Transactions on Automatic Control, Vol. 32, Issue. 9, p. 754.

Sato, M. Abe, K. and Takeda, H. 1988. Learning control of finite Markov chains with an explicit trade-off between estimation and control. IEEE Transactions on Systems, Man, and Cybernetics, Vol. 18, Issue. 5, p. 677.

Jalali, A. and Ferguson, M. 1989. Computationally efficient adaptive control algorithms for Markov chains. p. 1283.

Borkar, V. S. 1991. Self-tuning control of diffusions without the identifiability condition. Journal of Optimization Theory and Applications, Vol. 68, Issue. 1, p. 117.

Borkar, V. S. 1993. On the Milito-Cruz adaptive control scheme for Markov chains. Journal of Optimization Theory and Applications, Vol. 77, Issue. 2, p. 387.

Borkar, V.S. 2000. Sample complexity for Markov chain self-tuner. Systems & Control Letters, Vol. 41, Issue. 2, p. 95.

Campos-Nanez, E. and Patek, S.D. 2005. Adaptive Optimization of Markov Reward Processes. p. 8034.

Malikopoulos, Andreas A. 2009. Convergence Properties of a Computational Learning Model for Unknown Markov Chains. Journal of Dynamic Systems, Measurement, and Control, Vol. 131, Issue. 4,

Article contents

Strong consistency of a modified maximum likelihood estimator for controlled Markov chains

Abstract

Keywords

Access options

Footnotes

References

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Strong consistency of a modified maximum likelihood estimator for controlled Markov chains

Abstract

Keywords

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests