Preface

doi:10.1017/CBO9781316471104.001

This book aims to provide an accessible treatment of partially observed Markov decision processes (POMDPs) to researchers and graduate students in electrical engineering, computer science and applied mathematics. “Accessible” means that, apart from certain parts in Part IV of the book, only an engineering version of probability theory is required as background. That is, measure theoretic probability is not required and statistical limit theorems are only used informally.

Contributions to POMDPs have been made by several communities: operations research, robotics, machine learning, speech recognition, artificial intelligence, control systems theory, and economics. POMDPs have numerous examples in controlled sensing, wireless communications, machine learning, control systems, social learning and sequential detection.

Stochastic models and Bayesian state estimation (filtering) are essential ingredients in POMDPs. As a result, this book is organized into four parts:

• Part I: Stochastic models and Bayesian filtering is an introductory treatment of Bayesian filtering. The aim is to provide, in a concise manner, material essential for POMDPs.
• Part II: Partially Observed Markov Decision Processes: models and algorithms deals with the formulation and algorithms for solving POMDPs, together with examples in controlled sensing.
• Part III: Partially Observed Markov Decision Processes: structural results constitutes the core of this book. It consists of six chapters (Chapters 9 to 14) that deal with lattice programming methods to characterize the structure of the optimal policy without brute force computations.
• Part IV: Stochastic approximation and reinforcement learning deals with stochastic gradient algorithms, simulation-based gradient estimation and reinforcement learning algorithms for MDPs and POMDPs.

Appendix A is a self-contained elementary description of stochastic simulation.

Appendix B gives a short description of continuous-time HMM filters and their link to discrete-time filtering algorithms.

The abstraction of POMDPs becomes alive with applications. This book contains several examples starting from target tracking in Bayesian filtering to optimal search, risk measures, active sensing, adaptive radars and social learning.

The typical applications of POMDPs assume finite action spaces and the state (belief) space is the unit simplex. Such problems are not riddled with technicalities and sophisticated measurable selection theorems are not required.

Courses: This book has been taught in several types of graduate courses.

At UBC, Parts I and II, together with some topics in Part IV, are taught in a thirteenweek course with thirty-nine lecture hours.

Book contents

Preface

Summary

Access options

Book contents

Preface

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive