No CrossRef data available.
Article contents
Some reward–penalty rules for the multi-armed bandit problem which are asymptotically optimal
Published online by Cambridge University Press: 01 July 2016
Abstract
In the mathematical learning literature, reward–penalty rules have been studied in various decision-theoretic and game-theoretic contexts, the multi-armed bandit problem included. Here we propose an elaboration of Bather's randomised allocation indices which yields rules for the multi-armed bandit which are both reward-penalty and asymptotically optimal.
Keywords
- Type
- Letters to the Editor
- Information
- Copyright
- Copyright © Applied Probability Trust 1983