Hostname: page-component-cd9895bd7-hc48f Total loading time: 0 Render date: 2024-12-26T14:55:59.442Z Has data issue: false hasContentIssue false

What is the simplest model that can account for high-fidelity imitation?

Published online by Cambridge University Press:  10 November 2022

Joel Z. Leibo
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com
Raphael Köster
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com
Alexander Sasha Vezhnevets
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com
Edgar A. Duénez-Guzmán
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com
John P. Agapiou
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com
Peter Sunehag
Affiliation:
DeepMind, London EC4A 3TW, UK jzl@deepmind.com rkoster@deepmind.com vezhnick@deepmind.com duenez@deepmind.com jagapiou@deepmind.com sunehag@deepmind.com www.jzleibo.com

Abstract

What inductive biases must be incorporated into multi-agent artificial intelligence models to get them to capture high-fidelity imitation? We think very little is needed. In the right environments, both instrumental- and ritual-stance imitation can emerge from generic learning mechanisms operating on non-deliberative decision architectures. In this view, imitation emerges from trial-and-error learning and does not require explicit deliberation.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bhoopchand, A., Brownfield, B., Collister, A., Lago, A. D., Edwards, A., Everett, R., … Zhang, L. M. (2022). Learning robust real-time cultural transmission without human data. arXiv preprint arXiv:2203.00715.Google Scholar
Borsa, D., Piot, B., Munos, R., & Pietquin, O. (2019). Observational learning by reinforcement learning. Proceedings of the 18th International Conference on Autonomous Agents and Multi-Agent Systems (pp. 1117–1124).Google Scholar
Catmur, C., Walsh, V., & Heyes, C. (2009). Associative sequence learning: The role of experience in the development of imitation and the mirror system. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1528), 23692380.CrossRefGoogle ScholarPubMed
Dolan, R. J., & Dayan, P. (2013). Goals and habits in the brain. Neuron, 80(2), 312325.CrossRefGoogle Scholar
Ha, S., & Jeong, H. (2022). Social learning spontaneously emerges by searching optimal heuristics with deep reinforcement learning. arXiv preprint arXiv:2204.12371.Google Scholar
Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814.CrossRefGoogle ScholarPubMed
Heyes, C. (2016). Homo imitans? Seven reasons why imitation couldn't possibly be associative. Philosophical Transactions of the Royal Society B: Biological Sciences, 371(1686), 20150069.CrossRefGoogle ScholarPubMed
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. International Conference on Learning Representations (ICLR).Google Scholar
Köster, R., Hadfield-Menell, D., Everett, R., Weidinger, L., Hadfield, G. K., & Leibo, J. Z. (2022). Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents. Proceedings of the National Academy of Sciences, 119(3).CrossRefGoogle ScholarPubMed
Leibo, J. Z., Zambaldi, V., Lanctot, M., Marecki, J., & Graepel, T. (2017). Multi-agent reinforcement learning in sequential social dilemmas. Proceedings of the 16th Conference on Autonomous Agents and Multi-Agent Systems (pp. 464473).Google Scholar
Mercier, H., & Sperber, D. (2017). The enigma of reason. Harvard University Press.Google Scholar
Ndousse, K. K., Eck, D., Levine, S., & Jaques, N. (2021). Emergent social learning via multi-agent reinforcement learning. International Conference on Machine Learning (pp. 79918004). PMLR.Google Scholar
Perolat, J., Leibo, J. Z., Zambaldi, V., Beattie, C., Tuyls, K., & Graepel, T. (2017). A multi-agent reinforcement learning model of common-pool resource appropriation. Advances in Neural Information Processing Systems, 30.Google Scholar
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., … Silver, D. (2020). Mastering Atari, Go, chess and shogi by planning with a learned model. Nature, 588(7839), 604609.CrossRefGoogle ScholarPubMed
Vinitsky, E., Köster, R., Agapiou, J. P., Duéñez-Guzmán, E., Vezhnevets, A. S., & Leibo, J. Z. (2021). A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings. arXiv preprint arXiv:2106.09012.Google Scholar
Woodward, M., Finn, C., & Hausman, K. (2020). Learning to interactively learn and assist. Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 03, pp. 25352543).Google Scholar