Meta-learning as a bridge between neural networks and symbolic Bayesian models

R. Thomas McCoy; Thomas L. Griffiths

doi:10.1017/S0140525X24000116

Meta-learning as a bridge between neural networks and symbolic Bayesian models

Published online by Cambridge University Press: 23 September 2024

R. Thomas McCoy and

Thomas L. Griffiths

Show author details

R. Thomas McCoy: Affiliation:
Department of Linguistics, Yale University, New Haven, CT, USA tom.mccoy@yale.edu https://rtmccoy.com/
Thomas L. Griffiths*: Affiliation:
Departments of Psychology and Computer Science, Princeton University, Princeton, NJ, USA tomg@princeton.edu http://cocosci.princeton.edu/tom/
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interests
References

Rights & Permissions

Abstract

Meta-learning is even more broadly relevant to the study of inductive biases than Binz et al. suggest: Its implications go beyond the extensions to rational analysis that they discuss. One noteworthy example is that meta-learning can act as a bridge between the vector representations of neural networks and the symbolic hypothesis spaces used in many Bayesian models.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e155

DOI: https://doi.org/10.1017/S0140525X24000116 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Like many aspects of cognition, learning can be analyzed at multiple levels. At a high level (Marr's [Reference Marr1982] “computational” level) we can model learning by providing an abstract characterization of the learner's inductive biases: The preferences that the learner has for some types of generalizations over others (Mitchell, Reference Mitchell1997). At a lower level, learning can be modeled by specifying the particular algorithms and representations that the learner uses to realize its inductive biases. For each of these levels, there are modeling traditions that have been successful: Rational analysis and Bayesian models are defined at the computational level, while neural networks are defined at the level of algorithm and representation. But how can we connect these different traditions? How can we work toward unified theories that bridge the divide between levels? In this piece, we agree with, and extend, Binz et al.'s point that meta-learning is a powerful tool for studying inductive biases in a way that spans levels of analysis.

Binz et al. describe how an agent can use meta-learning to derive inductive biases from its environment. This makes meta-learning well-suited for modeling situations where human inductive biases align with some problem that humans face – the situations that are well-covered by the paradigm of rational analysis (Anderson, Reference Anderson1990). As Binz et al. discuss, meta-learning can therefore be used to enable an algorithmically defined model (such as a neural network) to find the solution predicted by rational analysis, a procedure that bridges the divide between abstract rational solutions and specific algorithmic instantiations.

This direction laid out by Binz et al. is exciting. We argue that it can in fact be viewed as one special case within a broader space of possible lines of inquiry about inductive biases that meta-learning opens up. In the more general case, the Bayesian perspective allows us to define an inductive bias as a probability distribution over hypotheses. A neural network can meta-learn from data sampled from this distribution, giving it the inductive bias in question. The distribution that is used could be drawn from (an approximation of) a human's experience, in which case this framing matches the extension of rational analysis that Binz et al. advocate for. But it is also possible to use other approaches for defining this distribution, which can correspond to any probabilistic model. Since we can control probabilistic models, using a probabilistic model to define the distribution makes it possible to control the inductive biases that the meta-learned model ends up with (Lake, Reference Lake2019; Lake & Baroni, Reference Lake and Baroni2023; McCoy, Grant, Smolensky, Griffiths, & Linzen, Reference McCoy, Grant, Smolensky, Griffiths and Linzen2020). This allows us to take an inductive bias defined at Marr's computational level and distill it into a neural network defined at the level of algorithm and representation.

Traditionally, certain types of inductive biases have been associated with certain types of algorithms and representations: The strong inductive biases of Bayesian models have generally been based on discrete, symbolic representations (e.g., Goodman, Tenenbaum, Feldman, & Griffiths, Reference Goodman, Tenenbaum, Feldman and Griffiths2008), while neural networks use continuous vector representations (Hinton, McClelland, & Rumelhart, Reference Hinton, McClelland, Rumelhart, Rumelhart and McClelland1986) and have weak inductive biases. However, meta-learning enables us to separately manipulate inductive biases and representations, making it possible to model previously inaccessible combinations of representations and inductive biases. One noteworthy example is that we can use meta-learning to give symbolic inductive biases to a neural network, allowing us to study whether and how structured hypothesis spaces (of the sort often used in Bayesian models) can be realized in a system with continuous vector representations (the type of representation that is central in both biological and artificial neural networks). Thus, while Binz et al. note that meta-learning can be used as an alternative to Bayesian models, another use of meta-learning is in fact to expand the applicability of Bayesian approaches by reconciling them with connectionist models – thereby bringing together two successful research traditions that have often been framed as antagonistic (e.g., Griffiths, Chater, Kemp, Perfors, & Tenenbaum, Reference Griffiths, Chater, Kemp, Perfors and Tenenbaum2010; McClelland et al., Reference McClelland, Botvinick, Noelle, Plaut, Rogers, Seidenberg and Smith2010).

In our prior work, we have demonstrated the efficacy of this approach in the domain of language (McCoy & Griffiths, Reference McCoy and Griffiths2023). We started with a Bayesian model created by Yang and Piantadosi (Reference Yang and Piantadosi2022), whose inductive bias is defined using a symbolic grammar. We then used meta-learning (specially, MAML: Finn, Abbeel, & Levine, Reference Finn, Abbeel and Levine2017; Grant, Finn, Levine, Darrell, & Griffiths, Reference Grant, Finn, Levine, Darrell and Griffiths2018) to distill this Bayesian model's prior into a neural network. The resulting system had strong inductive biases of the sort traditionally found only in symbolic models, enabling this system to learn formal linguistic patterns from small numbers of examples despite being a neural network, a class of systems that normally requires far more examples to learn such patterns. Additionally, the flexible neural implementation of this system made it possible to train it on naturalistic textual data, something that is intractable with the Bayesian model that we built on. Thus, meta-learning enabled the creation of a model that combined the complementary strengths of Bayesian and connectionist models of language learning.

These results show that inductive biases traditionally defined using symbolic Bayesian models can instead be realized inside a neural network. Therefore, symbolic inductive biases do not necessarily require inherently symbolic representations or algorithms. This demonstration provides one already-realized example of how meta-learning can advance our understanding of foundational questions about how different levels of cognition relate to each other, in ways that go beyond the realm of rational analysis.

Financial support

This material is based upon work supported by the National Science Foundation SBE Postdoctoral Research Fellowship under Grant No. 2204152 and the Office of Naval Research under Grant No. N00014-18-1-2873.

Competing interests

None.

References

Anderson, J. R. (1990). The adaptive character of thought. Psychology Press.Google Scholar

Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, 1126–1135.Google Scholar

Goodman, N. D., Tenenbaum, J. B., Feldman, J., & Griffiths, T. L. (2008). A rational analysis of rule-based concept learning. Cognitive Science, 32(1), 108–154.CrossRef Google Scholar PubMed

Grant, E., Finn, C., Levine, S., Darrell, T., & Griffiths, T. (2018). Recasting gradient-based meta-learning as hierarchical Bayes. International Conference on Learning Representations.Google Scholar

Griffiths, T. L., Chater, N., Kemp, C., Perfors, A., & Tenenbaum, J. B. (2010). Probabilistic models of cognition: Exploring representations and inductive biases. Trends in Cognitive Sciences, 14(8), 357–364.CrossRef Google Scholar PubMed

Hinton, G. E., McClelland, J. L., & Rumelhart, D. E. (1986). Distributed representations. In Rumelhart, D. E. & McClelland, J. L. (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition, vol. 1. Foundations, pp. 77–109.Google Scholar

Lake, B. M. (2019). Compositional generalization through meta sequence-to-sequence learning. Advances in Neural Information Processing Systems, 32.Google Scholar

Lake, B. M., & Baroni, M. (2023). Human-like systematic generalization through a meta-learning neural network. Nature, 623(7985), 115–121.CrossRef Google Scholar PubMed

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. W.H. Freeman.Google Scholar

McClelland, J. L., Botvinick, M. M., Noelle, D. C., Plaut, D. C., Rogers, T. T., Seidenberg, M. S., & Smith, L. B. (2010). Letting structure emerge: Connectionist and dynamical systems approaches to cognition. Trends in Cognitive Sciences, 14(8), 348–356.CrossRef Google Scholar PubMed

McCoy, R. T., Grant, E., Smolensky, P., Griffiths, T. L., & Linzen, T. (2020). Universal linguistic inductive biases via meta-learning. Proceedings of the 42nd Annual Conference of the Cognitive Science Society, 737–743.Google Scholar

McCoy, R. T., & Griffiths, T. L. (2023). Modeling rapid language learning by distilling Bayesian priors into artificial neural networks. arXiv preprint arXiv:2305.14701.Google Scholar

Mitchell, T. M. (1997). Machine learning. McGraw Hill.Google Scholar

Yang, Y., & Piantadosi, S. T. (2022). One model for the learning of language. Proceedings of the National Academy of Sciences, 119(5), e2021865119.CrossRef Google Scholar PubMed