Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-9pm4c Total loading time: 0 Render date: 2024-04-25T11:05:14.704Z Has data issue: false hasContentIssue false

17 - Optimal control theory and the linear Bellman equation

from VI - Agent-based models

Published online by Cambridge University Press:  07 September 2011

Hilbert J. Kappen
Affiliation:
Radboud University
David Barber
Affiliation:
University College London
A. Taylan Cemgil
Affiliation:
Boğaziçi Üniversitesi, Istanbul
Silvia Chiappa
Affiliation:
University of Cambridge
Get access

Summary

Introduction

Optimising a sequence of actions to attain some future goal is the general topic of control theory [26, 9]. It views an agent as an automaton that seeks to maximise expected reward (or minimise cost) over some future time period. Two typical examples that illustrate this are motor control and foraging for food.

As an example of a motor control task, consider a human throwing a spear to kill an animal. Throwing a spear requires the execution of a motor program that is such that at the moment that the hand releases the spear it has the correct speed and direction to hit the desired target. A motor program is a sequence of actions, and this sequence can be assigned a cost that consists generally of two terms: a path cost that specifies the energy consumption to contract the muscles to execute the motor program, and an end cost that specifies whether the spear will kill the animal, just hurt it, or miss it altogether. The optimal control solution is a sequence of motor commands that results in killing the animal by throwing the spear with minimal physical effort. If x denotes the state space (the positions and velocities of the muscles), the optimal control solution is a function u(x, t) that depends both on the actual state of the system at each time t and also explicitly on time.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2011

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

[1] C., Archambeau, M., Opper, Y., Shen, D., Cornford and J., Shawe-Taylor. Variational inference for diffusion processes. In D., Koller and Y., Singer, editors, Advances in Neural Information Processing Systems 19. MIT Press, 2008.Google Scholar
[2] R., Bellman and R., Kalaba. Selected Papers on Mathematical Trends in Control Theory. Dover, 1964.Google Scholar
[3] D. P., Bertsekas. Dynamic Programming and Optimal Control. Second edition Athena Scientific, 2000.Google Scholar
[4] M., da Silva, F., Durand and J., Popović. Linear Bellman combination for control of character animation. In SIGGRAPH '09: ACM SIGGRAPH 2009 papers, pages 1–10, New York, 2009. ACM.Google Scholar
[5] R., Dearden, N., Friedman and D., Andre. Model based bayesian exploration. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pages 150–159, 1999.Google Scholar
[6] A. A., Feldbaum. Dual control theory. I–IV. Automation remote control, 21–22:874–880, 1033–1039, 1–12, 109–121, 1960.Google Scholar
[7] N. M., Filatov and H., Unbehauen. Adaptive Dual Control. Springer-Verlag, 2004.Google Scholar
[8] W. H., Fleming. Exit probabilities and optimal stochastic control. Applied Mathematics and Optimization, 4:329–346, 1978.Google Scholar
[9] W. H., Fleming and H. M., Soner. Controlled Markov Processes and Viscosity solutions. Springer-Verlag, 1992.Google Scholar
[10] J. J., Florentin. Optimal, probing, adaptive control of a simple Bayesian system. International Journal of Electronics, 13:165–177, 1962.Google Scholar
[11] G., Fraser-Andrews. A multiple-shooting technique for optimal control. Journal of Optimization Theory and Applications, 102:299–313, 1999.Google Scholar
[12] H., Goldstein. Classical Mechanics. Addison Wesley, 1980.Google Scholar
[13] M. T., Heath. Scientific Computing: An Introductory Survey. McGraw-Hill, 2002. 2nd edition.Google Scholar
[14] U., Jönsson, C., Trygger and P., Ögren. Lectures on optimal control. Unpublished, 2002.
[15] L. P., Kaelbling, M. L., Littman and A. R., Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99–134, 1998.Google Scholar
[16] H. J., Kappen. A linear theory for control of non-linear stochastic systems. Physical Review Letters, 95:200201, 2005.Google Scholar
[17] H. J., Kappen. Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment, page P11011, 2005.Google Scholar
[18] H. J., Kappen, V., Gómez and M., Opper. Optimal control as a graphical model inference problem. http://arxiv.org/abs/0901.0633.
[19] H. J., Kappen and S., Tonk. Optimal exploration as a symmetry breaking phenomenon. Technical report, 2010.
[20] M., Kearns and S., Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, pages 209–232, 2002.Google Scholar
[21] P. R., Kumar. Optimal adaptive control of linear-quadratic-Gaussian systems. SIAM Journal on Control and Optimization, 21(2):163–178, 1983.Google Scholar
[22] T., Mensink, J., Verbeek and H. J., Kappen. EP for Efficient Stochastic Control with Obstacles. In ECAI, pages 1–6 2010.Google Scholar
[23] L. S., Pontryagin, V. G., Boltyanskii, R. V., Gamkrelidze and E. F., Mishchenko. The Mathematical Theory of Optimal Processes. Interscience, 1962.Google Scholar
[24] P., Poupart and N., Vlassis. Model-based Bayesian reinforcement learning in partially observable domains. In Proceedings International Symposium on Artificial Intelligence and Mathematics (ISAIM), 2008.Google Scholar
[25] E. J., Sondik. The optimal control of partially observable Markov processes. PhD thesis, Stanford University, 1971.
[26] R., Stengel. Optimal Control and Estimation. Dover publications, 1993.Google Scholar
[27] H., Theil. A note on certainty equivalence in dynamic planning. Econometrica, 25:346–349, 1957.Google Scholar
[28] E., Theodorou, J., Buchli and S., Schaal. Learning policy improvements with path integrals. In International Conference on Artificial Intelligence and Statistics, 2010.Google Scholar
[29] E., Theodorou, J., Buchli and S., Schaal. Reinforcement learning of motor skills in high dimensions: a path integral approach. In International Conference of Robotics and Automation, 2010.Google Scholar
[30] E. A., Theodorou, J., Buchli and S., Schaal. Path integral-based stochastic optimal control for rigid body dynamics. In Adaptive Dynamic Programming and Reinforcement Learning, 2009. ADPRL '09. ieee symposium on, pages 219–225, 2009.Google Scholar
[31] S. B., Thrun. The role of exploration in learning control. In D. A., White and D. A., Sofge, editors, Handbook of Intelligent Control. Multiscience Press, 1992.Google Scholar
[32] E., Todorov. Linearly-solvable Markov decision problems. In B., Schölkopf, J., Platt, and T., Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 1369–1376. MIT Press, 2007.Google Scholar
[33] E., Todorov. General duality between optimal control and estimation. In 47th IEEE Conference on Decision and Control, pages 4286–4292, 2008.Google Scholar
[34] B., van den Broek, W., Wiegerinck, and H. J., Kappen. Graphical model inference in optimal control of stochastic multi-agent systems. Journal of AI Research, 32:95–122, 2008.Google Scholar
[35] B., van den Broek, W., Wiegerinck and H. J., Kappen. Optimal control in large stochastic multi-agent systems. In Adaptive Agents and Multi-Agent Systems III. Adaptation and Multi-Agent Learning, volume 4865/2008, pages 15–26. Springer, 2008.Google Scholar
[36] R., Weber. Lecture notes on optimization and control. Lecture notes of a course given autumn 2006, 2006.
[37] W., Wiegerinck, B., van den Broek and H. J., Kappen. Stochastic optimal control in continuous space-time multi-agent systems. In Uncertainty in Artificial Intelligence. Proceedings of the 22th conference, pages 528–535. Association for UAI, 2006.Google Scholar
[38] W., Wiegerinck, B., van den Broek and H. J., Kappen. Optimal on-line scheduling in stochastic multi-agent systems in continuous space and time. In Proceedings AAMAS, page 8, 2007.Google Scholar
[39] J., Yong and X. Y., Zhou. Stochastic Controls. Hamiltonian Systems and HJB Equations. Springer, 1999.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×