Is human compositionality meta-learned?

Jacob Russin; Sam Whitman McGrath; Ellie Pavlick; Michael J. Frank

doi:10.1017/S0140525X24000189

Is human compositionality meta-learned?

Published online by Cambridge University Press: 23 September 2024

Jacob Russin ,

Sam Whitman McGrath ,

Ellie Pavlick and

Michael J. Frank

Show author details

Jacob Russin: Affiliation:
Department of Computer Science, Brown University, Providence, RI, USA jake_russin@brown.edu ellie_pavlick@brown.edu https://jlrussin.github.io/ https://cs.brown.edu/people/epavlick/ Department of Cognitive and Psychological Sciences, Brown University, Providence, RI, USA
Sam Whitman McGrath: Affiliation:
Department of Philosophy, Brown University, Providence, RI, USA sam_mcgrath1@brown.edu https://scholar.google.com/citations?user=B3b7kAYAAAAJ&hl=en
Ellie Pavlick: Affiliation:
Department of Computer Science, Brown University, Providence, RI, USA jake_russin@brown.edu ellie_pavlick@brown.edu https://jlrussin.github.io/ https://cs.brown.edu/people/epavlick/
Michael J. Frank*: Affiliation:
Department of Cognitive and Psychological Sciences, Carney Institute for Brain Science, Brown University, Providence, RI, USA michael_frank@brown.edu http://ski.clps.brown.edu/
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Recent studies suggest that meta-learning may provide an original solution to an enduring puzzle about whether neural networks can explain compositionality – in particular, by raising the prospect that compositionality can be understood as an emergent property of an inner-loop learning algorithm. We elaborate on this hypothesis and consider its empirical predictions regarding the neural mechanisms and development of human compositionality.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e162

DOI: https://doi.org/10.1017/S0140525X24000189 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Binz et al. review recent meta-learned models that can reproduce human-like compositional generalization behaviors (Lake & Baroni, Reference Lake and Baroni2023), but they stop short of endorsing meta-learning as a theoretical framework for understanding human compositionality. Here, we elaborate on this proposal, articulating the hypothesis that human compositionality can be understood as an emergent property of an inner-loop, in-context learning algorithm that is itself meta-learned.

Compositionality has played a key theoretical role in cognitive science since its inception (Chomsky, Reference Chomsky1957), providing an explanation for human systematic and productive generalization behaviors. These phenomena are readily explained by the compositionality of classical cognitive architectures, as the design of their symbolic representations and structure-sensitive operations intrinsically guarantees that they can redeploy familiar constituents in novel constructions (Fodor & Pylyshyn, Reference Fodor and Pylyshyn1988). It has been argued that neural networks are in principle incapable of playing the same explanatory role because they lack these architectural features (Fodor & Pylyshyn, Reference Fodor and Pylyshyn1988; Marcus, Reference Marcus1998).

Much work has explored inductive biases that might encourage compositionality to emerge in neural networks (Russin, Jo, O'Reilly, & Bengio, Reference Russin, Jo, O'Reilly and Bengio2020a; Smolensky, Reference Smolensky1990; Webb et al., Reference Webb, Frankland, Altabaa, Krishnamurthy, Campbell, Russin and Cohen2024), but meta-learning offers an original solution to the puzzle. As Binz et al. emphasize, when an inner-loop, in-context learning algorithm emerges within the activation dynamics of a meta-learning neural network, it can have fundamentally different properties than the outer-loop algorithm. Thus, even if the outer-loop algorithm lacks these inductive biases, the network may nevertheless implement an emergent in-context learning algorithm that embodies them implicitly.

Lake and Baroni (Reference Lake and Baroni2023) have shown that such an inner-loop algorithm can pass tests of compositionality that standard neural networks fail (Lake & Baroni, Reference Lake, Baroni, Dy and Krause2018). The question, then, is whether such networks can serve as explanatory models of human compositional generalization. Can we think of human compositionality as an emergent property of an inner-loop, in-context learning algorithm? How might we evaluate such a hypothesis? Here, we consider two independent aspects of this proposal: First, its implications for neural mechanisms, and second, for development.

One straightforward mechanistic prediction is that employing inner-loop, in-context learning mechanisms, rather than outer-loop learning mechanisms, should facilitate compositional generalization behaviors. Cognitive and computational neuroscience provides empirical support for this prediction. Cognitive control – the ability to overcome existing prepotent responses and to flexibly adapt to arbitrary goals (Miller & Cohen, Reference Miller and Cohen2001) – is an important capacity for human in-context learning. The neural mechanisms known to be involved in cognitive control, such as working memory, gating, and top-down modulation in the prefrontal cortex (Miller & Cohen, Reference Miller and Cohen2001; O'Reilly & Frank, Reference O'Reilly and Frank2006; Russin, O'Reilly, & Bengio, Reference Russin, O'Reilly and Bengio2020b), are also thought to be essential to compositional abilities such as inferring and applying rules (Calderon, Verguts, & Frank, Reference Calderon, Verguts and Frank2022; Collins & Frank, Reference Collins and Frank2013; Frank & Badre, Reference Frank and Badre2012; Kriete, Noelle, Cohen, & O'Reilly, Reference Kriete, Noelle, Cohen and O'Reilly2013), deductive and inductive reasoning (Crescentini et al., Reference Crescentini, Seyed-Allaei, De Pisapia, Jovicich, Amati and Shallice2011; Goel, Reference Goel2007), and processing complex syntax (Thompson-Schill, Reference Thompson-Schill and Cutler2005). Thus, a shared set of neural mechanisms may underlie both in-context learning and compositionality in humans, lending support to the meta-learning hypothesis.

A second, independent prediction is a developmental one – that human compositional generalization abilities are themselves meta-learned over the course of development. Adults come into any psychological experiment equipped with a wealth of prior experience. The meta-learning hypothesis predicts that this includes experiences encouraging the adoption of more compositional learning strategies (i.e., ones sensitive to implicit compositional structure). In general, children exhibit a developmental trajectory consistent with this hypothesis. Older children learn new tasks more efficiently (Bergelson, Reference Bergelson2020), especially when these tasks involve cognitive capacities essential to in-context learning, such as working memory and executive functions (Munakata, Snyder, & Chatham, Reference Munakata, Snyder and Chatham2012). Furthermore, children improve throughout development on tasks involving the composition of rules (Piantadosi & Aslin, Reference Piantadosi and Aslin2016; Piantadosi, Palmeri, & Aslin, Reference Piantadosi, Palmeri and Aslin2018).

Innate mechanisms or inductive biases may still be required to successfully meta-learn a compositional inner-loop algorithm in the first place. Indeed, studies in machine learning have shown that architecture seems to be an important factor in determining whether in-context learning capabilities emerge (Chan et al., Reference Chan, Santoro, Lampinen, Wang, Singh, Richemond and Hill2022). Similarly, findings from cognitive and computational neuroscience have emphasized the importance of architectural features such as prefrontal gating mechanisms for the emergence of abstract representations that could mediate subsequent in-context generalization abilities (Collins & Frank, Reference Collins and Frank2013; Frank & Badre, Reference Frank and Badre2012; Kriete et al., Reference Kriete, Noelle, Cohen and O'Reilly2013; Rougier, Noelle, Braver, Cohen, & O'Reilly, Reference Rougier, Noelle, Braver, Cohen and O'Reilly2005). These inductive biases can also explain incidental hierarchical rule learning and generalization in infants (Werchan, Collins, Frank, & Amso, Reference Werchan, Collins, Frank and Amso2015, Reference Werchan, Collins, Frank and Amso2016). Thus, a combination of innate architectural features and meta-learning experiences may be necessary for human compositionality to emerge.

The meta-learning datasets used in previous modeling efforts have typically been developmentally unrealistic because they have been contrived to engender narrow compositional generalization abilities that are specific to a particular type of task. Could meta-learning in less explicitly structured learning scenarios lead to the acquisition of broader compositional generalization abilities? This question deserves careful empirical study, but we may draw a preliminary insight from the success of large language models (Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan and Dhariwal2020), which develop in-context learning abilities (von Oswald et al., Reference von Oswald, Niklasson, Schlegel, Kobayashi, Zucchet, Scherrer and Sacramento2023; Xie, Raghunathan, Liang, & Ma, Reference Xie, Raghunathan, Liang and Ma2022) that in some cases exhibit human-like compositionality (Webb, Holyoak, & Lu, Reference Webb, Holyoak and Lu2022; Wei et al., Reference Wei, Wang, Schuurmans, Bosma, Ichter, Xia and Zhou2023; Zhou et al., Reference Zhou, Schärli, Hou, Wei, Scales, Wang and Chi2022). Unlike models explicitly designed for meta-learning, large language models are trained to predict the next token on very large datasets of unstructured text. These datasets contain more language data than humans are exposed to in an entire lifetime (Linzen & Baroni, Reference Linzen and Baroni2021), so future work needs to investigate what kinds of inductive biases are necessary to improve their sample efficiency. However, these models provide proof of concept that neural networks can develop compositional in-context learning algorithms by training on relatively unstructured data.

Binz et al. shy away from a robust commitment to meta-learning as a theoretical framework, instead emphasizing its utility as a methodological tool. Here, we have demonstrated how the meta-learning perspective on human compositionality can generate testable empirical hypotheses about underlying mechanisms and developmental trajectory. If such a research program bears fruit, it will elevate meta-learning from a useful tool to a novel cognitive theory.

Financial support

M. J. F. is supported by ONR grant N00014-23-1-2792. E. P. and J. R. are supported by COBRE grant no. 5P20GM103645-10.

Competing interest

None.

References

Bergelson, E. (2020). The comprehension boost in early word learning: Older infants are better learners. Child Development Perspectives, 14(3), 142–149. https://doi.org/10.1111/cdep.12373CrossRef Google Scholar PubMed

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P.. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://papers.nips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html Google Scholar

Calderon, C. B., Verguts, T., & Frank, M. J. (2022). Thunderstruck: The ACDC model of flexible sequences and rhythms in recurrent neural circuits. PLoS Computational Biology, 18(2), e1009854. https://doi.org/10.1371/journal.pcbi.1009854CrossRef Google Scholar PubMed

Chan, S. C. Y., Santoro, A., Lampinen, A. K., Wang, J. X., Singh, A., Richemond, P. H., … Hill, F. (2022). Data distributional properties drive emergent in-context learning in transformers. Advances in Neural Information Processing Systems, 35, 18878–18891. https://papers.nips.cc/paper_files/paper/2022/hash/77c6ccacfd9962e2307fc64680fc5ace-Abstract-Conference.html Google Scholar

Chomsky, N. (Ed.). (1957). Syntactic structures. Mouton & Co.CrossRef Google Scholar

Collins, A. G. E., & Frank, M. J. (2013). Cognitive control over learning: Creating, clustering and generalizing task-set structure. Psychological Review, 120(1), 190–229. https://doi.org/10.1037/a0030852CrossRef Google Scholar PubMed

Crescentini, C., Seyed-Allaei, S., De Pisapia, N., Jovicich, J., Amati, D., & Shallice, T. (2011). Mechanisms of rule acquisition and rule following in inductive reasoning. Journal of Neuroscience, 31(21), 7763–7774. https://doi.org/10.1523/JNEUROSCI.4579-10.2011CrossRef Google Scholar PubMed

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71. https://doi.org/10.1016/0010-0277(88)90031-5CrossRef Google Scholar PubMed

Frank, M. J., & Badre, D. (2012). Mechanisms of hierarchical reinforcement learning in corticostriatal circuits 1: Computational analysis. Cerebral Cortex, 22(3), 509–526. https://doi.org/10.1093/cercor/bhr114CrossRef Google Scholar PubMed

Goel, V. (2007). Anatomy of deductive reasoning. Trends in Cognitive Sciences, 11(10), 435–441. https://doi.org/10.1016/j.tics.2007.09.003CrossRef Google Scholar PubMed

Kriete, T., Noelle, D. C., Cohen, J. D., & O'Reilly, R. C. (2013). Indirection and symbol-like processing in the prefrontal cortex and basal ganglia. Proceedings of the National Academy of Sciences of the United States of America, 110(41), 16390–16395. https://doi.org/10.1073/pnas.1303547110CrossRef Google Scholar PubMed

Lake, B. M., & Baroni, M. (2018). Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In Dy, J. G. & Krause, A. (Eds.), Proceedings of the 35th International Conference on Machine Learning (Vol. 80, pp. 2879–2888). PMLR. http://proceedings.mlr.press/v80/lake18a.html Google Scholar

Lake, B. M., & Baroni, M. (2023). Human-like systematic generalization through a meta-learning neural network. Nature, 623, 1–7. https://doi.org/10.1038/s41586-023-06668-3CrossRef Google Scholar PubMed

Linzen, T., & Baroni, M. (2021). Syntactic structure from deep learning. Annual Review of Linguistics, 7(1), 195–212. https://doi.org/10.1146/annurev-linguistics-032020-051035CrossRef Google Scholar

Marcus, G. F. (1998). Rethinking eliminative connectionism. Cognitive Psychology, 37(3), 243–282. https://doi.org/10.1006/cogp.1998.0694CrossRef Google Scholar PubMed

Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual Review of Neuroscience, 24, 167–202.CrossRef Google Scholar PubMed

Munakata, Y., Snyder, H. R., & Chatham, C. H. (2012). Developing cognitive control: Three key transitions. Current Directions in Psychological Science, 21(2), 71–77. https://doi.org/10.1177/0963721412436807CrossRef Google Scholar PubMed

O'Reilly, R. C., & Frank, M. J. (2006). Making working memory work: A computational model of learning in the prefrontal cortex and basal ganglia. Neural Computation, 18(2), 283–328. https://doi.org/10.1162/089976606775093909CrossRef Google Scholar

Piantadosi, S., & Aslin, R. (2016). Compositional reasoning in early childhood. PLoS ONE, 11(9), e0147734. https://doi.org/10.1371/journal.pone.0147734CrossRef Google Scholar PubMed

Piantadosi, S. T., Palmeri, H., & Aslin, R. (2018). Limits on composition of conceptual operations in 9-month-olds. Infancy, 23(3), 310–324. https://doi.org/10.1111/infa.12225CrossRef Google Scholar PubMed

Rougier, N. P., Noelle, D., Braver, T. S., Cohen, J. D., & O'Reilly, R. C. (2005). Prefrontal cortex and the flexibility of cognitive control: Rules without symbols. Proceedings of the National Academy of Sciences of the United States of America, 102(20), 7338–7343.CrossRef Google Scholar PubMed

Russin, J., Jo, J., O'Reilly, R. C., & Bengio, Y. (2020a). Systematicity in a recurrent neural network by factorizing syntax and semantics. Proceedings for the 42nd Annual Meeting of the Cognitive Science Society, 7. https://cognitivesciencesociety.org/cogsci20/papers/0027/0027.pdf Google Scholar

Russin, J., O'Reilly, R. C., & Bengio, Y. (2020b). Deep learning needs a prefrontal cortex. In Bridging AI and Cognitive Science (BAICS) Workshop, ICLR, 2020, 11.Google Scholar

Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence, 46(1–2), 159–216. https://doi.org/10.1016/0004-3702(90)90007-MCrossRef Google Scholar

Thompson-Schill, S. L. (2005). Dissecting the language organ: A new look at the role of Broca's area in language processing. In Cutler, Anne (Ed.), Twenty-first century psycholinguistics (1st ed., Vol. 1, pp. 1–18). Routledge.Google Scholar

von Oswald, J., Niklasson, E., Schlegel, M., Kobayashi, S., Zucchet, N., Scherrer, N., … Sacramento, J. (2023). Uncovering mesa-optimization algorithms in transformers (arXiv:2309.05858). arXiv. https://doi.org/10.48550/arXiv.2309.05858CrossRef Google Scholar

Webb, T., Frankland, S. M., Altabaa, A., Krishnamurthy, K., Campbell, D., Russin, J., … Cohen, J. D. (2024). The relational bottleneck as an inductive bias for efficient abstraction (arXiv:2309.06629). arXiv. http://arxiv.org/abs/2309.06629 Google Scholar

Webb, T., Holyoak, K. J., & Lu, H. (2022). Emergent analogical reasoning in large language models. Nature Human Behaviour, 7(9). https://doi.org/10.1038/s41562-023-01659-wGoogle Scholar

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., … Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35, 24824–24837. https://papers.nips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf Google Scholar

Werchan, D. M., Collins, A. G. E., Frank, M. J., & Amso, D. (2015). 8-Month-old infants spontaneously learn and generalize hierarchical rules. Psychological Science, 26(6), 805–815. https://doi.org/10.1177/0956797615571442CrossRef Google Scholar PubMed

Werchan, D. M., Collins, A. G. E., Frank, M. J., & Amso, D. (2016). Role of prefrontal cortex in learning and generalizing hierarchical rules in 8-month-old infants. The Journal of Neuroscience, 36(40), 10314–10322. https://doi.org/10.1523/JNEUROSCI.1351-16.2016CrossRef Google Scholar PubMed

Xie, S. M., Raghunathan, A., Liang, P., & Ma, T. (2022). An explanation of in-context learning as implicit Bayesian inference. International Conference on Learning Representations. https://openreview.net/pdf?id=RdJVFCHjUMI Google Scholar

Zhou, D., Schärli, N., Hou, L., Wei, J., Scales, N., Wang, X., … Chi, E. (2022). Least-to-most prompting enables complex reasoning in large language models. The Eleventh International Conference on Learning Representations. https://openreview.net/forum?id=WZH7099tgfM Google Scholar