Hostname: page-component-5d59c44645-kw98b Total loading time: 0 Render date: 2024-03-04T07:55:43.707Z Has data issue: false hasContentIssue false

Q-Table compression for reinforcement learning

Published online by Cambridge University Press:  04 December 2018

Leonardo Amado
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail,
Felipe Meneguzzi
Pontifical Catholic University of Rio Grande do Sul, Av. Ipiranga 6681, Porto Alegre, RS, 90619-900, Brazil; e-mail,


Reinforcement learning (RL) algorithms are often used to compute agents capable of acting in environments without prior knowledge of the environment dynamics. However, these algorithms struggle to converge in environments with large branching factors and their large resulting state-spaces. In this work, we develop an approach to compress the number of entries in a Q-value table using a deep auto-encoder. We develop a set of techniques to mitigate the large branching factor problem. We present the application of such techniques in the scenario of a real-time strategy (RTS) game, where both state space and branching factor are a problem. We empirically evaluate an implementation of the technique to control agents in an RTS game scenario where classical RL fails and provide a number of possible avenues of further work on this problem.

Special Issue Contribution
© Cambridge University Press, 2018 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


Barriga, N. A., Stanescu, M. & Buro, M. 2017. Combining strategic learning with tactical search in real-time strategy games. In AIIDE, 9–15. AAAI Press.Google Scholar
Bianchi, R. A., Celiberto, L. A. Jr., Santos, P. E., Matsuura, J. P. & de Mantaras, R. L. 2015. Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artificial Intelligence 226, 0 102-121.Google Scholar
Boyan, J. A. & Moore, A. W. 1995. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7, Tesauro, G., Touretzky, D. S. & Leen, T. K. (eds). MIT Press, 369–376.Google Scholar
Guestrin, C., Lagoudakis, M. G. & Parr, R. 2002. Coordinated reinforcement learning. In Proceedings of the Nineteenth International Conference on Machine Learning, ICML ’02, 227–234, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc.Google Scholar
Hebb, D. O. 1949. The Organization of Behavior: A Neuropsychological Theory. Wiley.Google Scholar
Jaidee, U. & Munoz-Avila, H. 2012. Classq-l: a q-learning algorithm for adversarial real-time strategy games.Google Scholar
Kaelbling, L. P., Littman, M. L. & Moore, A. W. 1996. Reinforcement learning: a survey. Journal of Artificial Intelligence Research 4, 237–285.Google Scholar
Kok, J. R. & Vlassis, N. 2006. Collaborative multiagent reinforcement learning by payoff propagation. Journal of Machine Learning Research 7, 1789–1828.Google Scholar
Lange, S. & Riedmiller, M. A. 2010. Deep auto-encoder neural networks in reinforcement learning. In IJCNN, 1–8. IEEE.Google Scholar
Legenstein, R., Wilbert, N. & Wiskott, L. 2010. Reinforcement Learning on Slow Features of High-Dimensional Input Streams. PLoS Comput Biol 6, e1000894.Google Scholar
Mataric, M. J. 1994. Reward functions for accelerated learning. In Proceedings of the Eleventh International Conference on Machine Learning, 181–189. Morgan Kaufmann.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. & Riedmiller, M. 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 5180, 529–533.Google Scholar
Nair, A., Srinivasan, P., Blackwell, S., Alcicek, C., Fearon, R., Maria, A. D., Panneershelvam, V., Suleyman, M., Beattie, C., Petersen, S., Legg, S., Mnih, V., Kavukcuoglu, K. & Silver, D. 2015. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296.Google Scholar
Ontañón, S. 2013. The combinatorial multi-armed bandit problem and its application to real-time strategy games. In AIIDE, Sukthankar, G. & Horswill, I. (eds),. AAAI.Google Scholar
Pan, S. J. & Yang, Q. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 220, 1345–1359.Google Scholar
Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edition. John Wiley & Sons, Inc.Google Scholar
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. 1988. Neurocomputing: foundations of research. Learning Representations by Back-propagating Errors, 696–699. MIT Press.Google Scholar
Sharma, M., Holmes, M., Santamaria, J., Irani, A., Isbell, C. & Ram, A. 2007. Transfer learning in real-time strategy games using hybrid cbr/rl. In Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI'07, 1041–1046. Morgan Kaufmann Publishers Inc.Google Scholar
R., S. & Barto, A. G. 1998. Introduction to Reinforcement Learning, 1st edition. MIT Press.Google Scholar
Synnaeve, G., Nardelli, N., Auvolat, A., Chintala, S., Lacroix, T., Lin, Z., Richoux, F. & Usunier, N. 2016. Torchcraft: a library for machine learning research on real-time strategy games. arXiv preprint arXiv:1611.00625.Google Scholar
Tesauro, G. 1992. Practical issues in temporal difference learning. Machine Learning 8, 257–277.Google Scholar
Tesauro, G. 1995. Temporal difference learning and td-gammon. Communications of the ACM 380, 58–68.Google Scholar
Tokic, M. 2010. Adaptive e-greedy exploration in reinforcement learning based on value differences. In Proceedings of the 33rd Annual German Conference on Advances in Artificial Intelligence, KI’10, pages 203–210. Springer-Verlag.Google Scholar
Vincent, P., Larochelle, H., Bengio, Y. & Manzagol, P.-A. 2008. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, ICML ’08, 1096–1103. ACM.Google Scholar
Vinyals, O., Ewalds, T., Bartunov, S., Georgiev, P., Vezhnevets, A. S., Yeo, M., Makhzani, A., Küttler, H., Agapiou, J., Schrittwieser, J., Quan, J., Gaffney, S., Petersen, S., Simonyan, K., Schaul, T., van Hasselt, H., Silver, D., Lillicrap, T. P., Calderone, K., Keet, P., Brunasso, A., Lawrence, D., Ekermo, A., Repp, J. & Tsing, R. 2017. Starcraft II: a new challenge for reinforcement learning. CoRR abs/1708.04782.Google Scholar
Watkins, J. C. H. & Dayan, P. 1992. Technical note: Q-learning. Machine Learning 8, 279–292.Google Scholar
Wendel, V., Alef, J., Göbel, S. & Steinemtz, R. 2014. A method for simulating players in a collaborative multiplayer serious game. In Proceedings of the 2014 ACM International Workshop on Serious Games, SeriousGames ’14, 15–20. ACM.Google Scholar
Zhang, C. & Lesser, V. 2013. Coordinating multi-agent reinforcement learning with limited communication. In Proceedings of the 2013 International Conference on Autonomous Agents and Multi-agent Systems, AAMAS ’13, 1101–1108. International Foundation for Autonomous Agents and Multiagent Systems.Google Scholar
Zhang, J. & Zong, C. 2015. Deep neural networks in machine translation: an overview. IEEE Intelligent Systems 300, 16–25.Google Scholar