Hostname: page-component-78c5997874-ndw9j Total loading time: 0 Render date: 2024-11-18T08:43:14.270Z Has data issue: false hasContentIssue false

Estimation and control in Markov chains

Published online by Cambridge University Press:  01 July 2016

P. Mandl*
Affiliation:
Institute of Information Theory and Automation, Czechoslovak Academy of Sciences

Abstract

We consider a finite controlled Markov chain, the description of which depends on an unknown parameter a, and investigate the following control policy. To each a an optimal stationary control is associated. a is estimated recurrently from the trajectory by the minimum contrast method, and the optimal stationary control corresponding to the estimate is used. We present asymptotic properties of the estimate and of the criterion function. They follow from the law of large numbers and from the central limit theorem for controlled Markov chains derived with the aid of martingales.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1974 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bellman, R. (1957) A Markovian decision process. J. Math. and Mech. 6, 679684.Google Scholar
Billingsley, P. (1961) Statistical Inference for Markov Processes. University of Chicago Press, Chicago.Google Scholar
Billingsley, P. (1961) The Lindeberg-Lévy theorem for martingales. Proc. Amer. Math. Soc. 12, 788792.Google Scholar
Brown, B. M. (1971) Martingale central limit theorems. Ann. Math. Statist. 42, 5966.Google Scholar
Brown, B. M. and Eagleson, G. K. (1971) Martingale convergence to infinitely divisible laws with finite variances. Trans. Amer. Math. Soc. 162, 449453.Google Scholar
Gänssler, P. (1972) Note on minimum contrast estimates for Markov processes. Metrika 19, 115130.Google Scholar
Howard, R. A. (1960) Dynamic Programming and Markov Processes. Technology Press and John Wiley, New York.Google Scholar
Loève, M. (1960) Probability Theory. D. van Nostrand, Princeton, N. J.Google Scholar
Mandl, P. (1971a) On the variance in controlled Markov chains. Kybernetika (Prague) 7, 112.Google Scholar
Mandl, P. (1971b) On the control of a Markov chain in the presence of unknown parameters. Trans. Sixth Prague Conf. on Inf. Theory, Random Proc., Statist. Decision Functions, 601612. Academia, Prague.Google Scholar
Mandl, P. (1972) An application of Itô's formula to stochastic control systems. Stability of Stoch. Dynamical Systems, Proc. Int. Symposium, 813. Springer-Verlag, Heidelberg.Google Scholar
Mandl, P. (1973a) On the adaptive control of finite state Markov processes. Z. Wahrscheinlichkeitsth. 27, 263276.Google Scholar
Mandl, P. (1973b) A connection between controlled Markov chains and martingales. Kybernetika (Prague) 9, 237241.Google Scholar
Pfanzagl, J. (1969) On the measurability and consistency of minimum contrast estimates. Metrika 14, 249272.Google Scholar