Published online by Cambridge University Press: 14 July 2016
Suppose that π is a policy for resource allocation in a stochastic environment and π ∗ is an optimal policy. Two existing procedures for policy evaluation are described and compared. Both of these evaluate π by means of upper bounds on R(π ∗) – R(π), the total reward lost when making resource allocations according to π rather than π∗. The bounds developed by these two methods are called Type 1 and Type 2. We demonstrate by example that neither of these procedures dominates the other in the sense of always yielding tighter bounds. A modification to Type 2 bounds is proposed resulting in an improved procedure which always dominates the Type 1 approach.
During the course of this research the author was supported by the National Research Council as a Senior Research Associate at the Department of Operations Research, Naval Postgraduate School, Monterey, CA 93943–5000, USA.
Full text views reflects PDF downloads, PDFs sent to Google Drive, Dropbox and Kindle and HTML full text views.