Hostname: page-component-cd9895bd7-lnqnp Total loading time: 0 Render date: 2024-12-21T17:10:49.492Z Has data issue: false hasContentIssue false

Expression unleashed in artificial intelligence

Published online by Cambridge University Press:  17 February 2023

Ekaterina I. Tolstaya
Affiliation:
Waymo LLC, New York, NY, USA eig@waymo.com http://katetolstaya.com/
Abhinav Gupta
Affiliation:
MILA, Montreal, QC H2S 3H1, Canada abhinavg@nyu.edu https://mila.quebec/en/person/abhinav-gupta/
Edward Hughes
Affiliation:
DeepMind, London, UK. edwardhughes@google.com http://edwardhughes.io

Abstract

The problem of generating generally capable agents is an important frontier in artificial intelligence (AI) research. Such agents may demonstrate open-ended, versatile, and diverse modes of expression, similar to humans. We interpret the work of Heintz & Scott-Phillips as a minimal sufficient set of socio-cognitive biases for the emergence of generally expressive AI, separate yet complementary to existing algorithms.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2023. Published by Cambridge University Press

In recent years, artificial intelligence (AI) systems powered by machine learning have demonstrated human-level capabilities in a variety of games (Brown et al., Reference Brown and Sandholm2018; Jaderberg et al., Reference Jaderberg, Czarnecki, Dunning, Marris, Lever, Castañeda and Graepel2019; Moravčík et al., Reference Moravčík, Schmid, Burch, Lisý, Morrill, Bard and Bowling2017; Silver et al., Reference Strouse, McKee, Botvinick, Hughes and Everett2018), and are increasingly finding applications in the real world (Grigorescu et al., Reference Grigorescu, Trasnea, Cocias and Macesanu2020; Hwangbo et al., Reference Hwangbo, Lee, Dosovitskiy, Bellicoso, Tsounis, Koltun and Hutter2020; Mandhane et al., Reference Mandhane, Zhernov, Rauh, Gu, Wang, Xue and Mann2022). Despite this progress, AI remains specialists, lacking the breadth of competence across diverse tasks, which is characteristic of human intelligence (Chollet, Reference Chollet2019; Hutter, Reference Hutter2000; Legg et al., Reference Legg and Hutter2007). Training large models with large, diverse datasets of interactive behavior appears to be a promising direction for increased generality, both in the language domain (Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal and Amodei2020) and in 3D simulated worlds (Abramson et al., Reference Abramson, Ahuja, Barr, Brussee, Carnevale, Cassin and Zhu2020; Baker et al., Reference Baker, Kanitscheider, Markov, Wu, Powell, McGrew and Mordatch2019; Team, Open Ended Learning et al., Reference Tolstaya, Mahjourian, Downey, Vadarajan, Sapp and Anguelov2021). On the other hand, it is unclear whether a purely data-driven approach can scale toward open-ended intelligence. This motivates interest in algorithms designed for general learning from scratch, such as in emergent communication (Foerster et al., Reference Foerster, Assael, De Freitas and Whiteson2016; Lazaridou et al., Reference Lazaridou, Peysakhovich and Baroni2016) or never-ending learning (Mitchell et al., Reference Mitchell, Cohen, Hruschka, Talukdar, Yang, Betteridge and Welling2018).

The versatility of human social interaction provides a powerful lens through which to study the general capabilities of AI, dating back at least as far as Turing (Reference Turing1950). To partner with diverse individuals across a wide range of tasks necessarily requires flexible modes of expression, adapted on the fly with new conventions and commitments (Bard et al., Reference Bard, Foerster, Chandar, Burch, Lanctot, Song and Bowling2020; Dafoe et al., Reference Dafoe, Hughes, Bachrach, Collins, McKee, Leibo and Graepel2020). Indeed, domain-agnostic social intelligence may even be a sufficient iterative bootstrap to reach individual general intelligence, via cultural evolution (Henrich, Reference Henrich2015; Team, Cultural General Intelligence et al., Reference Bhoopchand, Brownfield, Collister, Lago, Edwards and Zhang2022). The question of how to unleash expression in AI is therefore timely and relevant. Research in this direction could even provide new insights into the evolutionary psychology of language, echoing recent links between AI and neuroscience (Macpherson et al., Reference Macpherson, Churchland, Sejnowski, DiCarlo, Kamitani, Takahashi and Hikida2021; Savage, Reference Scott-Phillips, Kirby and Ritchie2019).

We argue that the target article can be interpreted as a minimal set of socio-cognitive biases that may lead to improved versatility in AI, particularly in interaction with humans. Following the model of reinforcement learning algorithms (Sutton et al., Reference Bhoopchand, Brownfield, Collister, Lago, Edwards and Zhang2018), we identify desirable properties of the environment and of the agent, inspired by the co-evolutionary ecology of human communication. We relate these perspectives to existing approaches in AI, showing that they are relatively underrepresented, and thus provide valuable inspiration for future research.

In the social environment, partner choice ecology is perhaps the main driver for the evolution of ostension and inference. These capabilities underlie all human communication and expression because they enable humans to influence and decode the intentions of others. Partner choice social ecology develops these capabilities whenever humans can select their teammates. In harmony with these observations, partner choice catalyzes artificial learning agents to find the tit-for-tat solution to Prisoner's Dilemma, a strategy that not only plays cooperatively, but also encourages others to cooperate (Anastassacos et al., Reference Anastassacos, Hailes and Musolesi2020). Human feedback can itself be seen as a form of partner choice, when humans choose which AI models they would prefer. Indeed, social interaction with humans in the loop promotes generalizable and robust AI (Carroll et al., Reference Carroll, Shah, Ho, Griffiths, Seshia, Abbeel and Dragan2019; Jaques et al., Reference Jaques, McCleary, Engel, Ha, Bertsch, Eck and Picard2018). These works are a proof of concept that partner choice is important for generally expressive AI, and there is much yet to explore.

The paradigm of emergent communication has shown great promise in training artificial agents, both in situations where incentives are aligned, and in settings requiring negotiation or partial competition (Lazaridou et al., Reference Lazaridou and Baroni2020). Typically, the symbols used for communication do not have any pre-existing semantics. Rather their meaning emerges during training, leading to “code-model” communication (Scott-Phillips, Reference Scott-Phillips2014). Various studies (Bouchacourt et al., Reference Bouchacourt and Baroni2019; Kottur et al., Reference Kottur, Moura, Lee and Batra2017; Resnick et al., Reference Resnick, Gupta, Foerster, Dai and Cho2020) have found that the resulting protocols are not human-interpretable and do not share the structural features of human language. On the other hand, humans are capable of devising generalizable protocols in a zero-shot or few-shot manner (Kirby et al., Reference Kirby, Cornish and Smith2008; Scott-Phillips et al., Reference Silver, Hubert, Schrittwieser, Antonoglou, Lai, Guez and Hassabis2009).

We argue that this lacuna can be resolved if Gricean pragmatics is viewed as a fundamental objective in the design of agent algorithms for emergent communication. There is already promising work in this direction (Eccles et al., Reference Eccles, Bachrach, Lever, Lazaridou and Graepel2019; Kang et al., Reference Kang, Wang and de Melo2020; Pandia et al., Reference Pandia, Cong and Ettinger2021), but pragmatic reasoning is still often regarded as a supplementary bolt-on. Inverting this viewpoint would put inference and ostension at the heart of AI learning algorithms. For instance, an agent with an inverse model of its own policy may use this to infer the communicative intention of others on the fly, a simulation of simulation theory (Gordon, Reference Gordon1986; Heal, Reference Heal1986). Alternatively, one might hope such a model is constructed implicitly during the course of meta-learning across a population of partners (Gupta et al., Reference Gupta, Lanctot and Lazaridou2021; Strouse et al., Reference Sutton and Barto2021). Furthermore, such approaches can easily be combined with data-driven language models (Lowe et al., Reference Lowe, Gupta, Foerster, Kiela and Pineau2020).

There is a close relationship between pragmatic capabilities and theory of mind, a topic that has received some attention in the AI literature (Moreno et al., Reference Moreno, Hughes, McKee, Pires and Weber2021; Rabinowitz et al., Reference Rabinowitz, Perbet, Song, Zhang, Eslami and Botvinick2018). The ability to infer the beliefs of others has been shown to aid convention-building, leading to more generalizable conventions across diverse agents (Foerster et al., Reference Foerster, Song, Hughes, Burch, Dunning, Whiteson and Bowling2019; Hu et al., Reference Hu and Foerster2019, Reference Hu, Lerer, Cui, Pineda, Brown and Foerster2021). Moreover, when agents are incentivized to manipulate the learning of others, they achieve greater success across a variety of games, including when communication is useful (Foerster et al., Reference Foerster, Chen, Al-Shedivat, Whiteson, Abbeel and Mordatch2017; Jaques et al., Reference Jaques, Lazaridou, Hughes, Gülçehre, Ortega, Strouse and Freitas2019; Yang et al., Reference Yang, Li, Farajtabar, Sunehag, Hughes and Zha2020). As they strive for more general and versatile agents, algorithm designers could benefit greatly from understanding the cognitive bases for punishment and teaching in humans.

We conclude with an example of real-world importance. Autonomous vehicles (AVs) have the potential to make transportation safer and more convenient. To optimize for safety in interactive situations, the AV must both predict other road users, and be predictable to other road users. In other words, AVs require ostension and inference capabilities (Dolgov, Reference Dolgov2021). Existing AV systems use data from human drivers to generate human-like plans and to predict other road users' behavior (Sadigh et al., Reference Savage2020; Tolstaya et al., Reference Yang, Li, Farajtabar, Sunehag, Hughes and Zha2021). However, such data may not be enough if the behavior of others is strongly influenced by the autonomous car itself, particularly in previously unseen scenarios. Hence, just as in the target article, we come to the need for a metarepresentational framework, a means of reasoning over the representations that an AV induces in other road users. Already, there exists communicative hardware for AVs (Habibovic et al., Reference Habibovic, Lundgren, Andersson, Klingegård, Lagström, Sirkka and Larsson2020), alongside plans to elicit online human feedback as a guide (Team, Open Ended Learning et al., Reference Tolstaya, Mahjourian, Downey, Vadarajan, Sapp and Anguelov2021). Ostensive communication may be a key ingredient for safe autonomous driving in highly interactive urban environments.

Financial support

This work was supported by DeepMind, Waymo, and MILA.

Conflict of interest

Edward Hughes is employed at DeepMind and owns shares in Alphabet. Ekaterina Tolstaya is employed at Waymo and will be granted shares in Waymo. Abhinav Gupta is a student at MILA.

References

Abramson, J., Ahuja, A., Barr, I., Brussee, A., Carnevale, F., Cassin, M., … Zhu, R. (2020). Imitating interactive intelligence. arXiv preprint, arXiv:2012.05672.Google Scholar
Anastassacos, N., Hailes, S., & Musolesi, M. (2020). Partner selection for the emergence of cooperation in multi-agent systems using reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 05, pp. 7047–7054).CrossRefGoogle Scholar
Baker, B., Kanitscheider, I., Markov, T., Wu, Y., Powell, G., McGrew, B., & Mordatch, I. (2019). Emergent tool use from multi-agent autocurricula. arXiv preprint, arXiv:1909.07528.Google Scholar
Bard, N., Foerster, J. N., Chandar, S., Burch, N., Lanctot, M., Song, H. F., … Bowling, M. (2020). The Hanabi challenge: A new frontier for ai research. Artificial Intelligence, 280, 103216.CrossRefGoogle Scholar
Bouchacourt, D., & Baroni, M. (2019). Miss Tools and Mr Fruit: Emergent communication in agents learning about object affordances. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy (pp. 3909–3918). Association for Computational Linguistics.CrossRefGoogle Scholar
Brown, N., & Sandholm, T. (2018). Superhuman AI for heads-up no-limit poker: Libratus beats top professionals. Science, 359(6374), 418424.CrossRefGoogle ScholarPubMed
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, Article 159, 18771901.Google Scholar
Carroll, M., Shah, R., Ho, M. K., Griffiths, T., Seshia, S., Abbeel, P., & Dragan, A. (2019). On the utility of learning about humans for human-ai coordination. Advances in Neural Information Processing Systems, 32, Article 465, 51745185.Google Scholar
Chollet, F. (2019). On the measure of intelligence. arXiv preprint, arXiv:1911.01547.Google Scholar
Cultural General Intelligence, T., Bhoopchand, A., Brownfield, B., Collister, A., Lago, A. D., Edwards, A., … Zhang, L. M. (2022). Learning robust real-time cultural transmission without human data. ArXiv, abs/2203.00715.Google Scholar
Dafoe, A., Hughes, E., Bachrach, Y., Collins, T., McKee, K. R., Leibo, J. Z., … Graepel, T. (2020). Open problems in cooperative AI. arXiv preprint, arXiv:2012.08630.Google Scholar
Dolgov, D. (2021). How we've built the world's most experienced urban driver. Waypoint, the official Waymo blog, https://blog.waymo.com/2021/08/MostExperiencedUrbanDriver.html.Google Scholar
Eccles, T., Bachrach, Y., Lever, G., Lazaridou, A., & Graepel, T. (2019). Biases for emergent communication in multi-agent reinforcement learning. Advances In Neural Information Processing Systems, 32, Article 1176, 1312113131.Google Scholar
Foerster, J., Assael, I. A., De Freitas, N., & Whiteson, S. (2016). Learning to communicate with deep multi-agent reinforcement learning. Advances in neural information processing systems, 29, 21452153.Google Scholar
Foerster, J. N., Chen, R. Y., Al-Shedivat, M., Whiteson, S., Abbeel, P., & Mordatch, I. (2017). Learning with opponent-learning awareness. In Proceedings of the International Conference on Autonomous Agents and MultiAgent Systems (AAMAS '18). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC (pp. 122–130).Google Scholar
Foerster, J., Song, F., Hughes, E., Burch, N., Dunning, I., Whiteson, S., … Bowling, M. (2019). Bayesian action decoder for deep multi-agent reinforcement learning. In International Conference on Machine Learning (pp. 1942–1951). PMLR.Google Scholar
Gordon, R. M. (1986). Folk psychology as simulation. Mind & Language, 1(2), 158171.CrossRefGoogle Scholar
Grigorescu, S., Trasnea, B., Cocias, T., & Macesanu, G. (2020). A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3), 362386.CrossRefGoogle Scholar
Gupta, A., Lanctot, M., & Lazaridou, A. (2021). Dynamic population-based meta-learning for multi-agent communication with natural language. Advances in Neural Information Processing Systems, 34, 1689916912.Google Scholar
Habibovic, A., Lundgren, V. M., Andersson, J., Klingegård, M., Lagström, T., Sirkka, A., … Larsson, P. (2018). Communicating intent of automated vehicles to pedestrians. Frontiers in Psychology, 1336.CrossRefGoogle ScholarPubMed
Heal, J. (1986). Replication and functionalism. Language, Mind, and Logic, 1, 135150.Google Scholar
Henrich, J. (2015). The secret of our success. In The Secret of Our Success. Princeton University Press.CrossRefGoogle Scholar
Hu, H., & Foerster, J. N. (2020). Simplified action decoder for deep multi-agent reinforcement learning. International Conference on Learning Representations.Google Scholar
Hu, H., Lerer, A., Cui, B., Pineda, L., Brown, N., & Foerster, J. N. (2021). Off-belief learning. Proceedings of Machine Learning Research, 139, 43694379.Google Scholar
Hutter, M. (2000). A theory of universal artificial intelligence based on algorithmic complexity. ArXiv, cs/0004001.Google Scholar
Hwangbo, J., Lee, J., Dosovitskiy, A., Bellicoso, D., Tsounis, V., Koltun, V., & Hutter, M. (2019). Learning agile and dynamic motor skills for legged robots. Science Robotics, 4(26), eaau5872.CrossRefGoogle ScholarPubMed
Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Castañeda, A. G., … Graepel, T. (2019). Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443), 859865.CrossRefGoogle ScholarPubMed
Jaques, N., McCleary, J., Engel, J., Ha, D., Bertsch, F., Eck, D., & Picard, R. (2018). Learning via social awareness: Improving a deep generative sketching model with facial feedback. Proceedings of Machine Learning Research 86, 1–9, 2nd International Workshop on Artificial Intelligence in Affective Computing.Google Scholar
Jaques, N., Lazaridou, A., Hughes, E., Gülçehre, Ç., Ortega, P.A., Strouse, D., … Freitas, N.D. (2019). Social influence as intrinsic motivation for multi-agent deep reinforcement learning. Proceedings of the 36th International Conference on Machine Learning, PMLR (Vol. 97, pp. 3040–3049).Google Scholar
Kang, Y., Wang, T., & de Melo, G. (2020). Incorporating pragmatic reasoning communication into emergent language. Advances in Neural Information Processing Systems, 33, 1034810359.Google Scholar
Kirby, S., Cornish, H., & Smith, K. (2018). Cumulative cultural evolution in the laboratory: An experimental approach to the origins of structure in human language. Proceedings of the National Academy of Sciences, 105, 31.Google Scholar
Kottur, S., Moura, J., Lee, S., & Batra, D. (2017). Natural Language Does Not Emerge ‘Naturally’ in Multi-Agent Dialog. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (pp. 2962–2967). Association for Computational Linguistics.CrossRefGoogle Scholar
Lazaridou, A., & Baroni, M. (2020). Emergent multi-agent communication in the deep learning era. ArXiv, abs/2006.02419.Google Scholar
Lazaridou, A., Peysakhovich, A., & Baroni, M. (2017). Multi-agent cooperation and the emergence of (natural) language. International Conference on Learning Representations.Google Scholar
Legg, S., & Hutter, M. (2007). Universal Intelligence: A Definition of Machine Intelligence. Minds & Machines 17, 391444.CrossRefGoogle Scholar
Lowe, R., Gupta, A., Foerster, J. N., Kiela, D., & Pineau, J. (2020). On the interaction between supervision and self-play in emergent communication. International Conference on Learning Representations.Google Scholar
Macpherson, T., Churchland, A., Sejnowski, T., DiCarlo, J., Kamitani, Y., Takahashi, H., & Hikida, T. (2021). Natural and artificial intelligence: A brief introduction to the interplay between AI and neuroscience research. Neural Networks, 144, 603613.CrossRefGoogle Scholar
Mandhane, A., Zhernov, A., Rauh, M., Gu, C., Wang, M., Xue, F., … Mann, T. A. (2022). MuZero with Self-competition for Rate Control in VP9 Video Compression. ArXiv, abs/2202.06626.Google Scholar
Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., … Welling, J. (2018). Never-ending learning. Communications of the ACM, 61(5), 103115.CrossRefGoogle Scholar
Moravčík, M., Schmid, M., Burch, N., Lisý, V., Morrill, D., Bard, N., & Bowling, M. (2017). DeepStack: Expert-level artificial intelligence in heads-up no-limit poker. Science, 356(6337), 508513.CrossRefGoogle ScholarPubMed
Moreno, P., Hughes, E., McKee, K. R., Pires, B. Á., & Weber, T. (2021). Neural recursive belief states in multi-agent reinforcement learning. ArXiv, abs/2102.02274.Google Scholar
Pandia, L., Cong, Y., & Ettinger, A. (2021). Pragmatic competence of pre-trained language models through the lens of discourse connectives. In Proceedings of the 25th Conference on Computational Natural Language Learning (pp. 367–379). Association for Computational Linguistics.CrossRefGoogle Scholar
Rabinowitz, N., Perbet, F., Song, F., Zhang, C., Eslami, S. M. A. & Botvinick, M. (2018). Machine Theory of Mind. Proceedings of the 35th International Conference on Machine Learning, in Proceedings of Machine Learning Research, 80, 42184227.Google Scholar
Resnick, C., Gupta, A., Foerster, J., Dai, A. M., & Cho, K. (2020). Capacity, bandwidth, and compositionality in emergent language learning. International Conference on Autonomous Agents and Multiagent Systems. https://doi.org/10.48550/arXiv.1910.11424CrossRefGoogle Scholar
Sadigh, D., Sastry, S., Seshia, A. S., & Dragan, A. D. (2016). Planning for autonomous cars that leverage effects on human actions. Robotics: Science and Systems XII.Google Scholar
Savage, N. (2019). How AI and neuroscience drive each other forwards. Nature, 571(7766), S15S17.CrossRefGoogle ScholarPubMed
Scott-Phillips, T. C. (2014). Speaking our minds: Why human communication is different, and how language evolved to make it special. Macmillan International Higher Education.Google Scholar
Scott-Phillips, T. C., Kirby, S., & Ritchie, G. R. (2009). Signalling signalhood and the emergence of communication. Cognition, 113(2), 226233.CrossRefGoogle ScholarPubMed
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., … Hassabis, D. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 11401144.CrossRefGoogle ScholarPubMed
Strouse, D. J., McKee, K. R., Botvinick, M., Hughes, E., & Everett, R. (2021). Collaborating with humans without human data. Advances in Neural Information Processing Systems, 34, 1450214515.Google Scholar
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.Google Scholar
Team, Open Ended Learning, et al. (2021). Open-ended learning leads to generally capable agents. arXiv preprint, arXiv:2107.12808.Google Scholar
Tolstaya, E., Mahjourian, R., Downey, C., Vadarajan, B., Sapp, B., & Anguelov, D. (2021). Identifying driver interactions via conditional behavior prediction. In 2021 IEEE International Conference on Robotics and Automation (ICRA) (pp. 3473–3479). IEEE Press.CrossRefGoogle Scholar
Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236), 433460.CrossRefGoogle Scholar
Yang, J., Li, A., Farajtabar, M., Sunehag, P., Hughes, E., & Zha, H. (2020). Learning to incentivize other learning agents. Advances in Neural Information Processing Systems, 33, 1275.Google Scholar