Hostname: page-component-848d4c4894-mwx4w Total loading time: 0 Render date: 2024-06-16T05:53:27.027Z Has data issue: false hasContentIssue false

Bayesian dynamic programming

Published online by Cambridge University Press:  01 July 2016

Ulrich Rieder*
Affiliation:
University of Hamburg

Abstract

We consider a non-stationary Bayesian dynamic decision model with general state, action and parameter spaces. It is shown that this model can be reduced to a non-Markovian (resp. Markovian) decision model with completely known transition probabilities. Under rather weak convergence assumptions on the expected total rewards some general results are presented concerning the restriction on deterministic generalized Markov policies, the criteria of optimality and the existence of Bayes policies. These facts are based on the above transformations and on results of Hindererand Schäl.

Type
Research Article
Copyright
Copyright © Applied Probability Trust 1975 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aoki, M. (1967) Optimization of Stochastic Systems. Academic Press, New York.Google Scholar
Bauer, H. (1968) Wahrscheinlichkeitstheorie und Grundzüge der Masstheorie. Walter de Gruyter and Co., Berlin.Google Scholar
Bellman, R. (1961) Adaptive Control Processes — A Guided Tour. Princeton University Press, Princeton.Google Scholar
Billingsley, P. (1968) Convergence of Probability Measures. John Wiley, New York.Google Scholar
Blackwell, D. (1965) Discounted dynamic programming. Ann. Math. Statist. 36, 226235.Google Scholar
Blackwell, D. (1967) Positive dynamic programming. Proc. Fifth Berkeley Symp. Math. Statist. Prob. 1, 415418.Google Scholar
Degroot, M. E. (1970) Optimal Statistical Decisions. McGraw-Hill, New York.Google Scholar
Ferguson, T. S. (1967) Mathematical Statistics — A Decision Theoretical Approach. Academic Press, New York.Google Scholar
Furukawa, N. (1967) A Bayes controlled process. Mem. Fac. Science (Kyushu University) 21, 249258.Google Scholar
Furukawa, N. (1970) Fundamental theorems in a Bayes controlled process. Bull. Math. Statist. 14, 103110.Google Scholar
Hinderer, K. (1970) Foundations of non-stationary dynamic programming with discrete time parameter. Lee. Notes in Operat. Res. and Math. Systems, Vol. 33. Springer, Berlin.Google Scholar
Hinderer, K. (1971) Instationäre dynamische Optimierung bei schwachen Voraussetzungen über die Gewinnfunktionen. Abh. Math. Sem. Univ. Hamburg 36, 208223.Google Scholar
Martin, J. J. (1967) Bayesian Decision Problems and Markov Chains. John Wiley, New York.Google Scholar
Rhenius, D. (1971) Markoffsche Entscheidungsmodelle mit unvollständiger Information und Anwendungen in der Lerntheorie. Doctoral dissertation, University of Hamburg.Google Scholar
Schäl, M. (1974) On non-stationary continuous dynamic programming with discrete time parameter. Research report, University of Bonn.Google Scholar
Strauch, R. E. (1966) Negative dynamic programming. Ann. Math. Statist. 37, 871890.Google Scholar
Sworder, D. D. (1966) Optimal Adaptive Systems. Academic Press, New York.Google Scholar
Wessels, J. (1968) Decision Rules in Markovian Decision Processes with Incompletely Known Transition Probabilities. Doctoral disseration, TH Eindhoven.Google Scholar
White, D. J. (1969) Dynamic Programming. Oliver and Boyd, Edinburgh and London.Google Scholar
Whittle, P. (1969) Sequential decision processes with essential unobservables. Adv. Appl. Prob. 1, 271287.Google Scholar
Yakowitz, S. J. (1969) Mathematics of Adaptive Control Processes. Elsevier, New York.Google Scholar