Stopping time POMDPs for quickest change detection

Vikram Krishnamurthy

doi:10.1017/CBO9781316471104.017

Chapter 12 presented three structural results for stopping time POMDPs: convexity of the stopping region (for linear costs), the existence of a threshold switching curve for the optimal policy (under suitable conditions) and characterization of the optimal linear threshold policy. This chapter discusses several examples of stopping time POMDPs in quickest change detection. We will show that for these examples, convexity of the stopping set and threshold optimal policies arise naturally. Therefore, the structural results of Chapter 12 serve as a unifying theme and give substantial insight into what might otherwise be considered as a collection of sequential detection methods.

This chapter considers the following extensions of quickest change detection:

• Example 1: Quickest change detection with phase-distributed change time: classical quickest detection is equivalent to a stopping time POMDP where the underlying Markov chain jumps only once into an absorbing state (therefore the jump time is geometric distributed). How should quickest change detection be performed when the change time is phase-distributed and the stopping cost is quadratic in the belief state to penalize the variance in the state estimate?
• Example 2: Quickest transient detection: if the state of nature jumps into a state and then jumps out of the state, how should quickest detection of this transient detection be performed? The problem is equivalent to a stopping time POMDP where the Markov chain jumps only twice.
• Example 3: Risk-sensitive quickest detection: how to perform quickest detection with an exponential penalty.
• Example 4: Quickest detection with social learning: if individual agents learn an underlying state by performing social learning, how can the quickest change detection be applied by a global decision-maker? As will be shown, this interaction of local and global decision-makers results in interesting non-monotone behavior and the stopping set is not necessarily convex.
• Example 5: Quickest time herding with social learning: how should a decision-maker estimate an underlying state of nature when agents herd while performing social learning?
• Example 6: How should a monopoly optimally price a product when customers perform social learning? Each time a customer buys the product, the monopoly makes money and also gets publicity due to social learning. It is shown that it is optimal to start at a high price and then decrease the price over time.

Book contents

13 - Stopping time POMDPs for quickest change detection

Summary

Access options

Book contents

13 - Stopping time POMDPs for quickest change detection

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive