Skip to main content Accessibility help
×
Home
Hostname: page-component-cf9d5c678-dkwk2 Total loading time: 0.227 Render date: 2021-07-29T08:50:44.605Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

Q-Table compression for reinforcement learning

Published online by Cambridge University Press:  04 December 2018

Leonardo Amado
Affiliation:
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail leonardo.amado@acad.pucrs.br, felipe.meneguzzi@pucrs.br
Felipe Meneguzzi
Affiliation:
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail leonardo.amado@acad.pucrs.br, felipe.meneguzzi@pucrs.br

Abstract

Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.

Type
Special Issue Contribution
Copyright
© Cambridge University Press, 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Barriga, N. A., Stanescu, M. & Buro, M. 2017. Combining strategic learning with tactical search in real-time strategy games. In AIIDE, 9–15. AAAI Press.Google Scholar
Bianchi, R. A., Celiberto, L. A. Jr., Santos, P. E., Matsuura, J. P. & de Mantaras, R. L. 2015. Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artificial Intelligence 226, 0 102-121.Google Scholar
Boyan, J. A. & Moore, A. W. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, Tesauro, G., Touretzky, D. S. & Leen, T. K. (eds). MIT Press, 369–376.Google Scholar
Guestrin, C., Lagoudakis, M. G. & Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML ’02, 227–234, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.Google Scholar
Hebb, D. O. 1949. The Organization of Behavior: A Neuropsychological Theory. Wiley.Google Scholar
Jaidee, U. & Munoz-Avila, H. 2012. Classq-l: a q-learning algorithm for adversarial real-time strategy games.Google Scholar
Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285.Google Scholar
Kok, J. R. & Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828.Google Scholar
Lange, S. & Riedmiller, M. A. 2010. Deep auto-encoder neural networks in reinforcement learning. In IJCNN, 1–8. IEEE.Google Scholar
Legenstein, R., Wilbert, N. & Wiskott, L. 2010. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Comput Biol 6, e1000894.Google Scholar
Mataric, M. J. 1994. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, 181–189. Morgan Kaufmann.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 5180, 529–533.Google Scholar
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K. & Silver, D. 2015. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296.Google Scholar
Ontañón, S. 2013. The combinatorial multi-armed bandit problem and its application to real-time strategy games. In AIIDE, Sukthankar, G. & Horswill, I. (eds),. AAAI.Google Scholar
Pan, S. J. & Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 220, 1345–1359.Google Scholar
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edition. John Wiley & Sons, Inc.CrossRefGoogle Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. 1988. Neurocomputing: foundations of research. Learning Representations by Back-propagating Errors, 696–699. MIT Press.Google Scholar
Sharma, M., Holmes, M., Santamaria, J., Irani, A., Isbell, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid cbr/rl. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 1041–1046. Morgan Kaufmann Publishers Inc.Google Scholar
R., S. & Barto, A. G. 1998. Introduction to Reinforcement Learning, 1st edition. MIT Press.Google Scholar
Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F. & Usunier, N. 2016. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625.Google Scholar
Tesauro, G. 1992. Practical issues in temporal difference learning. Machine Learning 8, 257–277.Google Scholar
Tesauro, G. 1995. Temporal difference learning and td-gammon. Communications of the ACM 380, 58–68.Google Scholar
Tokic, M. 2010. Adaptive e-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence, KI’10, pages 203–210. Springer-Verlag.Google Scholar
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. ACM.Google Scholar
Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T. P., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J. & Tsing, R. 2017. Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782.Google Scholar
Watkins, J. C. H. & Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279–292.Google Scholar
Wendel, V., Alef, J., Göbel, S. & Steinemtz, R. 2014. A method for simulating players in a collaborative multiplayer serious game. In Proceedings of the 2014 ACM International Workshop on Serious Games, SeriousGames ’14, 15–20. ACM.Google Scholar
Zhang, C. & Lesser, V. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, 1101–1108. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
Zhang, J. & Zong, C. 2015. Deep neural networks in machine translation: an overview. IEEE Intelligent Systems 300, 16–25.Google Scholar
1
Cited by

Send article to Kindle

To send this article to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Q-Table compression for reinforcement learning
Available formats
×

Send article to Dropbox

To send this article to your Dropbox account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Dropbox.

Q-Table compression for reinforcement learning
Available formats
×

Send article to Google Drive

To send this article to your Google Drive account, please select one or more formats and confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your <service> account. Find out more about sending content to Google Drive.

Q-Table compression for reinforcement learning
Available formats
×
×

Reply to: Submit a response

Please enter your response.

Your details

Please enter a valid email address.

Conflicting interests

Do you have any conflicting interests? *