Learning self-play agents for combinatorial optimization problems

Ruiyang Xu; Karl Lieberherr

doi:10.1017/S026988892000020X

Learning self-play agents for combinatorial optimization problems

Part of: Adaptive Learning Agents 2019

Published online by Cambridge University Press: 23 March 2020

Ruiyang Xu

and

Karl Lieberherr

Show author details

Ruiyang Xu: Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu
Karl Lieberherr: Affiliation:
Northeastern University Khoury College of Computer Sciences, Boston, MA, USA, e-mails: ruiyang@ccs.neu.edu, lieber@ccs.neu.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

Type: Research Article
Information: The Knowledge Engineering Review , Volume 35 , 2020 , e11

DOI: https://doi.org/10.1017/S026988892000020X [Opens in a new window]
Copyright: © Cambridge University Press, 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Anthony, T., Tian, Z. & Barber, D. 2017. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5366–5376.Google Scholar

Auer, P., Cesa-Bianchi, N. & Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235–256.CrossRef Google Scholar

Auger, D., Couetoux, A. & Teytaud, O. 2013. Continuous upper confidence trees with polynomial exploration - consistency. In ECML/PKDD (1), Lecture Notes in Computer Science 8188, 194–209. Springer.Google Scholar

Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A, Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N, Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. & Pascanu, R. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint .Google Scholar

Bello, I., Pham, H., Le, Q. V., Norouzi, M. & Bengio, S. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint .Google Scholar

Bjornsson, Y. & Finnsson, H. 2009. Cadiaplayer: a simulation-based general game player. IEEE Transactions on Computational Intelligence and AI in Games 1(1), 4–15.CrossRef Google Scholar

Browne, C., Powley, E. J., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Liebana, D. P., Samothrakis, S. & Colton, S. 2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 1–43.CrossRef Google Scholar

Fujikawa, Y. & Min, M. 2013. A new environment for algorithm research using gamification. IEEE International Conference on Electro-Information Technology , EIT 2013, Rapid City, SD, 1–6.Google Scholar

Genesereth, M., Love, N. & Pell, B. 2005. General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62.Google Scholar

Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning 70, 1263–1272. JMLR. org.Google Scholar

Hintikka, J. 1982. Game-theoretical semantics: insights and prospects. Notre Dame Journal of Formal Logic 23(2), 219–241.CrossRef Google Scholar

Khalil, E., Dai, H., Zhang, Y., Dilkina, B. & Song, L. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, 6348–6358.Google Scholar

Kocsis, L. & Szepesvári, C. 2006. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning, ECML ’06, 282–293. Springer-Verlag.CrossRef Google Scholar

Laterre, A., Fu, Y., Jabri, M. K., Cohen, A.-S., Kas, D., Hajjar, K., Dahl, T. S., Kerkeni, A. & Beguir, K. 2018. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint .Google Scholar

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529–533.CrossRef Google Scholar PubMed

Novikov, F. & Katsman, V. 2018. Gamification of problem solving process based on logical rules. In Informatics in Schools. Fundamentals of Computer Science and Software Engineering, Pozdniakov, S. N. & Dagienė, V. (eds). Springer International Publishing, 369–380.CrossRef Google Scholar

Racanière, S., Weber, T., Reichert, D. P., Buesing, L., Guez, A., Rezende, D., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D. & Wierstra, D. 2017. Imagination-augmented agents for deep reinforcement learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5694–5705. Curran Associates Inc.Google Scholar

Rezende, M. & Chaimowicz, L. 2017. A methodology for creating generic game playing agents for board games. In 2017 16th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), 19–28. IEEE.CrossRef Google Scholar

Selsam, D., Lamm, M., Bunz, B., Liang, P., de Moura, L. & Dill, D. L. 2018. Learning a sat solver from single-bit supervision. arXiv preprint .Google Scholar

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T. P., Simonyan, K. & Hassabis, D. 2017a. Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm. CoRR .Google Scholar

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. & Hassabis, D. 2018. A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play. Science 362(6419), 1140–1144.CrossRef Google Scholar PubMed

Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. & Hassabis, D. 2017b. Mastering the game of Go without human knowledge. Nature 550, 354.CrossRef Google Scholar PubMed

Sniedovich, M. 2003. OR/MS games: 4. The joy of egg-dropping in Braunschweig and Hong Kong. INFORMS Transactions on Education 4(1), 48–64.CrossRef Google Scholar

Vinyals, O., Fortunato, M. & Jaitly, N. 2015. Pointer networks. In Advances in Neural Information Processing Systems, Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds), 28. Curran Associates, Inc., 2692–2700.Google Scholar

Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256.CrossRef Google Scholar

Article contents

Learning self-play agents for combinatorial optimization problems

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests