Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
Part IV - Stochastic Approximation and Reinforcement Learning
Published online by Cambridge University Press: 05 April 2016
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
Summary
Parts II and III of the book discussed stochastic dynamic programming for POMDPs. The aim was to determine the globally optimal policy. Part IV deals with stochastic gradient algorithms that converge to a local optimum. Such gradient algorithms are computationally efficient unlike dynamic programming which can be intractable. Furthermore, stochastic gradient algorithms form the basis for reinforcement learning – that is, they facilitate estimating the optimal policy when one does not know the parameters of the MDP or POMDP.
Chapter 15 discusses gradient estimation for Markov processes via stochastic simulation. This forms the basis of gradient-based reinforcement learning.
Chapter 16 presents simulation-based stochastic approximation algorithms for estimating the optimal policy of MDPs when the transition probabilities are not known. These algorithms are also described in the context of POMDPs. The Q-learning algorithm and gradient-based reinforcement learning algorithms are presented.
Chapter 17 gives a brief description of convergence analysis of stochastic approximation algorithms. Examples given include recursive maximum likelihood estimation of HMM parameters, the least mean squares algorithm for estimating the state of an HMM (instead of a Bayesian HMM filter), discrete stochastic optimization algorithms, and mean field dynamics for approximating the dynamics of information flow in social networks.
Unlike Parts I to III, Part IV of the book is non-Bayesian (in the sense that Bayes’ formula is not the main theme).
- Type
- Chapter
- Information
- Partially Observed Markov Decision ProcessesFrom Filtering to Controlled Sensing, pp. 341 - 342Publisher: Cambridge University PressPrint publication year: 2016