Book contents
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- 9 Structural results for Markov decision processes
- 10 Structural results for optimal filters
- 11 Monotonicity of value function for POMDPs
- 12 Structural results for stopping time POMDPs
- 13 Stopping time POMDPs for quickest change detection
- 14 Myopic policy bounds for POMDPs and sensitivity to model parameters
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
13 - Stopping time POMDPs for quickest change detection
from Part III - Partially Observed Markov Decision Processes: Structural Results
Published online by Cambridge University Press: 05 April 2016
- Frontmatter
- Contents
- Preface
- 1 Introduction
- Part I Stochastic Models and Bayesian Filtering
- Part II Partially Observed Markov Decision Processes: Models and Applications
- Part III Partially Observed Markov Decision Processes: Structural Results
- 9 Structural results for Markov decision processes
- 10 Structural results for optimal filters
- 11 Monotonicity of value function for POMDPs
- 12 Structural results for stopping time POMDPs
- 13 Stopping time POMDPs for quickest change detection
- 14 Myopic policy bounds for POMDPs and sensitivity to model parameters
- Part IV Stochastic Approximation and Reinforcement Learning
- Appendix A Short primer on stochastic simulation
- Appendix B Continuous-time HMM filters
- Appendix C Markov processes
- Appendix D Some limit theorems
- References
- Index
Summary
Chapter 12 presented three structural results for stopping time POMDPs: convexity of the stopping region (for linear costs), the existence of a threshold switching curve for the optimal policy (under suitable conditions) and characterization of the optimal linear threshold policy. This chapter discusses several examples of stopping time POMDPs in quickest change detection. We will show that for these examples, convexity of the stopping set and threshold optimal policies arise naturally. Therefore, the structural results of Chapter 12 serve as a unifying theme and give substantial insight into what might otherwise be considered as a collection of sequential detection methods.
This chapter considers the following extensions of quickest change detection:
• Example 1: Quickest change detection with phase-distributed change time: classical quickest detection is equivalent to a stopping time POMDP where the underlying Markov chain jumps only once into an absorbing state (therefore the jump time is geometric distributed). How should quickest change detection be performed when the change time is phase-distributed and the stopping cost is quadratic in the belief state to penalize the variance in the state estimate?
• Example 2: Quickest transient detection: if the state of nature jumps into a state and then jumps out of the state, how should quickest detection of this transient detection be performed? The problem is equivalent to a stopping time POMDP where the Markov chain jumps only twice.
• Example 3: Risk-sensitive quickest detection: how to perform quickest detection with an exponential penalty.
• Example 4: Quickest detection with social learning: if individual agents learn an underlying state by performing social learning, how can the quickest change detection be applied by a global decision-maker? As will be shown, this interaction of local and global decision-makers results in interesting non-monotone behavior and the stopping set is not necessarily convex.
• Example 5: Quickest time herding with social learning: how should a decision-maker estimate an underlying state of nature when agents herd while performing social learning?
• Example 6: How should a monopoly optimally price a product when customers perform social learning? Each time a customer buys the product, the monopoly makes money and also gets publicity due to social learning. It is shown that it is optimal to start at a high price and then decrease the price over time.
- Type
- Chapter
- Information
- Partially Observed Markov Decision ProcessesFrom Filtering to Controlled Sensing, pp. 284 - 311Publisher: Cambridge University PressPrint publication year: 2016
- 1
- Cited by