A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

Abdel Rodríguez; Peter Vrancx; Ricardo Grau; Ann Nowé

doi:10.1017/S026988891500020X

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

Published online by Cambridge University Press: 11 February 2016

Ricardo Grau and

Abdel Rodríguez: Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be
Peter Vrancx: Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be
Ricardo Grau: Affiliation:
Center of Studies in Informatics, Universidad Central ‘Marta Abreu’ de Las Villas – Carretera a Camajuaní Km 5, 50100 Villa Clara, Cuba e-mail: rgrau@uclv.edu.cu
Ann Nowé: Affiliation:
Computational Modeling Lab, Vrije Universiteit Brussel – Pleinlaan 2, 1050 Brussels, Belgium e-mail: abrodrig@vub.ac.be, pvrancx@vub.ac.be, ann.nowe@vub.ac.be

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Learning automata are reinforcement learners belonging to the class of policy iterators. They have already been shown to exhibit nice convergence properties in a wide range of discrete action game settings. Recently, a new formulation for a continuous action reinforcement learning automata (CARLA) was proposed. In this paper, we study the behavior of these CARLA in continuous action games and propose a novel method for coordinated exploration of the joint-action space. Our method allows a team of independent learners, using CARLA, to find the optimal joint action in common interest settings. We first show that independent agents using CARLA will converge to a local optimum of the continuous action game. We then introduce a method for coordinated exploration which allows the team of agents to find the global optimum of the game. We validate our approach in a number of experiments.

Type: Articles
Information: The Knowledge Engineering Review , Volume 31 , Issue 1: Adaptive Learning Agents , January 2016 , pp. 77 - 95

DOI: https://doi.org/10.1017/S026988891500020X [Opens in a new window]
Copyright: © Cambridge University Press, 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bush, R. & Mosteller, F. 1955. Stochastic Models for Learning. Wiley.CrossRef Google Scholar

Castelletti, A., Pianosi, F. & Restelli, M. 2012. Tree-based fitted Q-iteration for multi-objective Markov decision problems. In IJCNN, 1–8. IEEE.CrossRef Google Scholar

Claus, C. & Boutilier, C. 1998. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of National Conference on Artificial Intelligence (AAAI-98), 746–752.Google Scholar

Hilgard, E. 1948. Theories of Learning. Appleton-Century-Crofts.CrossRef Google Scholar

Hilgard, E. & Bower, B. 1966. Theories of Learning. Prentice Hall.Google Scholar

Howell, M. & Best, M. 2000. On-line PID tuning for engine idle-speed control using continuous action reinforcement learning automata. Control Engineering Practice 8(2), 147–154.CrossRef Google Scholar

Howell, M., Frost, G., Gordon, T. & Wu, Q 1997. Continuous action reinforcement learning applied to vehicle suspension control. Mechatronics 7(3), 263–276.CrossRef Google Scholar

Kapetanakis, S., Kudenko, D. & Strens, M. 2003. Learning to coordinate using commitment sequences in cooperative multiagent-systems. In Proceedings of the Third Symposium on Adaptive Agents and Multiagent Systems (AAMAS-03), 2004.Google Scholar

Parzen, E. 1960. Modern Probability Theory And Its Applications, Wiley Classics Edition. Wiley-Interscience.CrossRef Google Scholar

Rodríguez, A., Grau, R. & Nowé, A. 2011. Continuous action reinforcement learning automata. Performance and convergence. In Proceedings of the Third International Conference on Agents and Artificial Intelligence, Filipe, J. & Fred, A. (eds). SciTePress, 473–478.Google Scholar

Thathachar, M. & Sastry, P. 2004. Networks of Learning Automata: Techniques for Online Stochastic Optimization. Kluwer Academic Publishers.CrossRef Google Scholar

Tsetlin, M. 1961. The behavior of finite automata in random media. Avtomatika i Telemekhanika 22, 1345–1354.Google Scholar

Tsetlin, M. 1962. The behavior of finite automata in random media. Avtomatika i Telemekhanika 22, 1210–1219.Google Scholar

Tsypkin, Y. 1971. Adaptation and Learning in Automatic systems. Academic Press.Google Scholar

Tsypkin, Y. 1973. Foundations of the Theory of Learning Systems. Academic Press.Google Scholar

Veelen, M. & Spreij, P. 2009. Evolution in games with a continuous action space. Economic Theory 39(3), 355–376.CrossRef Google Scholar

Verbeeck, K. 2004. Coordinated Exploration in Multi-Agent Reinforcement Learning. PhD thesis, Vrije Universiteit Brussel, Faculteit Wetenschappen, DINF, Computational Modeling Lab, September.Google Scholar

Vrabie, D., Pastravanu, O., Abu-Khalaf, M. & Lewis, F. 2009. Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 2(45), 477–484.CrossRef Google Scholar

Article contents

A reinforcement learning approach to coordinate exploration with limited communication in continuous action games

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests