Skip to main content Accessibility help
×
Home

Learning self-play agents for combinatorial optimization problems

  • Ruiyang Xu (a1) and Karl Lieberherr (a1)

Abstract

Recent progress in reinforcement learning (RL) using self-play has shown remarkable performance with several board games (e.g., Chess and Go) and video games (e.g., Atari games and Dota2). It is plausible to hypothesize that RL, starting from zero knowledge, might be able to gradually approach a winning strategy after a certain amount of training. In this paper, we explore neural Monte Carlo Tree Search (neural MCTS), an RL algorithm that has been applied successfully by DeepMind to play Go and Chess at a superhuman level. We try to leverage the computational power of neural MCTS to solve a class of combinatorial optimization problems. Following the idea of Hintikka’s Game-Theoretical Semantics, we propose the Zermelo Gamification to transform specific combinatorial optimization problems into Zermelo games whose winning strategies correspond to the solutions of the original optimization problems. A specially designed neural MCTS algorithm is then introduced to train Zermelo game agents. We use a prototype problem for which the ground-truth policy is efficiently computable to demonstrate that neural MCTS is promising.

Copyright

References

Hide All
Anthony, T., Tian, Z. & Barber, D. 2017. Thinking fast and slow with deep learning and tree search. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5366–5376.
Auer, P., Cesa-Bianchi, N. & Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235256.
Auger, D., Couetoux, A. & Teytaud, O. 2013. Continuous upper confidence trees with polynomial exploration - consistency. In ECML/PKDD (1), Lecture Notes in Computer Science 8188, 194209. Springer.
Battaglia, P. W., Hamrick, J. B., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, V., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gulcehre, C., Song, F., Ballard, A, Gilmer, J., Dahl, G., Vaswani, A., Allen, K., Nash, C., Langston, V., Dyer, C., Heess, N, Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y. & Pascanu, R. 2018. Relational inductive biases, deep learning, and graph networks. arXiv preprint .
Bello, I., Pham, H., Le, Q. V., Norouzi, M. & Bengio, S. 2016. Neural combinatorial optimization with reinforcement learning. arXiv preprint .
Bjornsson, Y. & Finnsson, H. 2009. Cadiaplayer: a simulation-based general game player. IEEE Transactions on Computational Intelligence and AI in Games 1(1), 415.
Browne, C., Powley, E. J., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Liebana, D. P., Samothrakis, S. & Colton, S. 2012. A survey of Monte Carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games 4(1), 143.
Fujikawa, Y. & Min, M. 2013. A new environment for algorithm research using gamification. IEEE International Conference on Electro-Information Technology , EIT 2013, Rapid City, SD, 16.
Genesereth, M., Love, N. & Pell, B. 2005. General game playing: Overview of the AAAI competition. AI Magazine 26(2), 62.
Gilmer, J., Schoenholz, S. S., Riley, P. F., Vinyals, O. & Dahl, G. E. 2017. Neural message passing for quantum chemistry. In Proceedings of the 34th International Conference on Machine Learning 70, 12631272. JMLR. org.
Hintikka, J. 1982. Game-theoretical semantics: insights and prospects. Notre Dame Journal of Formal Logic 23(2), 219241.
Khalil, E., Dai, H., Zhang, Y., Dilkina, B. & Song, L. 2017. Learning combinatorial optimization algorithms over graphs. In Advances in Neural Information Processing Systems, 63486358.
Kocsis, L. & Szepesvári, C. 2006. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning, ECML ’06, 282–293. Springer-Verlag.
Laterre, A., Fu, Y., Jabri, M. K., Cohen, A.-S., Kas, D., Hajjar, K., Dahl, T. S., Kerkeni, A. & Beguir, K. 2018. Ranked reward: enabling self-play reinforcement learning for combinatorial optimization. arXiv preprint .
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S. & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540), 529533.
Novikov, F. & Katsman, V. 2018. Gamification of problem solving process based on logical rules. In Informatics in Schools. Fundamentals of Computer Science and Software Engineering, Pozdniakov, S. N. & Dagienė, V. (eds). Springer International Publishing, 369380.
Racanière, S., Weber, T., Reichert, D. P., Buesing, L., Guez, A., Rezende, D., Badia, A. P., Vinyals, O., Heess, N., Li, Y., Pascanu, R., Battaglia, P., Hassabis, D., Silver, D. & Wierstra, D. 2017. Imagination-augmented agents for deep reinforcement learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS ’17, 5694–5705. Curran Associates Inc.
Rezende, M. & Chaimowicz, L. 2017. A methodology for creating generic game playing agents for board games. In 2017 16th Brazilian Symposium on Computer Games and Digital Entertainment (SBGames), 19–28. IEEE.
Selsam, D., Lamm, M., Bunz, B., Liang, P., de Moura, L. & Dill, D. L. 2018. Learning a sat solver from single-bit supervision. arXiv preprint .
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T. P., Simonyan, K. & Hassabis, D. 2017a. Mastering Chess and Shogi by self-play with a general reinforcement learning algorithm. CoRR .
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K. & Hassabis, D. 2018. A general reinforcement learning algorithm that masters Chess, Shogi, and Go through self-play. Science 362(6419), 11401144.
Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A., Chen, Y., Lillicrap, T., Hui, F., Sifre, L., van den Driessche, G., Graepel, T. & Hassabis, D. 2017b. Mastering the game of Go without human knowledge. Nature 550, 354.
Sniedovich, M. 2003. OR/MS games: 4. The joy of egg-dropping in Braunschweig and Hong Kong. INFORMS Transactions on Education 4(1), 4864.
Vinyals, O., Fortunato, M. & Jaitly, N. 2015. Pointer networks. In Advances in Neural Information Processing Systems, Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M. & Garnett, R. (eds), 28. Curran Associates, Inc., 2692–2700.
Williams, R. J. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229256.

Learning self-play agents for combinatorial optimization problems

  • Ruiyang Xu (a1) and Karl Lieberherr (a1)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.