Challenges of meta-learning and rational analysis in large worlds

Margherita Calderan; Antonino Visalli

doi:10.1017/S0140525X24000128

Challenges of meta-learning and rational analysis in large worlds

Published online by Cambridge University Press: 23 September 2024

Margherita Calderan

and

Antonino Visalli

Show author details

Margherita Calderan: Affiliation:
Department of Developmental Psychology and Socialisation, University of Padova, Italy margherita.calderan@phd.unipd.it
Antonino Visalli*: Affiliation:
IRCCS San Camillo Hospital, Venice, Italy antonino.visalli@hsancamillo.it
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

We challenge Binz et al.'s claim of meta-learned model superiority over Bayesian inference for large world problems. While comparing Bayesian priors to model-training decisions, we question meta-learning feature exclusivity. We assert no special justification for rational Bayesian solutions to large world problems, advocating exploring diverse theoretical frameworks beyond rational analysis of cognition for research advancement.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e148

DOI: https://doi.org/10.1017/S0140525X24000128 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Binz et al. (argument 2) advocate for the superiority of meta-learned models over Bayesian inference for addressing large world problems (Savage, Reference Savage1972). Our commentary aims to question some perceived fallacies in their arguments.

First, although we recognize that “Identifying the correct set of assumptions becomes especially challenging once we deal with more complex problems,” we point out that meta-learned models also require specific assumptions. Examples are the selection of samples from the data-generating distribution, choice of the optimizer, weight initializations, or constraints to mimic bounded rationality. These decisions, too, can be conceived as priors and require a certain level of justification. Binz et al. explicitly emphasized the importance of appropriately making these choices (sect. 4, “Intricate Training Processes”). Bayesian or not, prior knowledge is a necessary condition for both modeling procedures. As a consequence, we contend that both Bayesian and meta-learned models present similar challenges from a rational perspective. Therefore, why should it be “hard to justify” prior assumptions for Bayesian models and not for meta-learned models? For instance, one could reconsider the critiques moved to Lucas, Griffiths, Williams, and Kalish (Reference Lucas, Griffiths, Williams and Kalish2015). To account for the bias toward expecting linear relationships between continuous variables, the authors assigned lower prior probabilities to quadratic and radial relationships as compared to linear ones (Lucas et al., Reference Lucas, Griffiths, Williams and Kalish2015). Binz et al. pose the issue that the chosen prior might not reflect all the functions (and the associated probability). However, similar concerns arise in the context of meta-learned models. What justifications exist for the selection of training data? How does one determine which functions to employ in the tasks used for training the model? Even more, on which tasks the model should be trained? Are these decisions easier to justify from a rational perspective as compared to the Bayesian counterpart? If the definition of the priors is considered a main obstacle of Bayesian inference to large world problems, a similar challenge extends to the decisions mentioned above, which determine the initial parameterization of meta-learned models and could be conceived as equivalent to a “prior” (Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019). Finally, if it is the impossibility to “have access to a prior or a likelihood” the main obstacle to large world problems, what is “the unique feature of meta-learned models” compared to other Bayesian methods that can construct their own empirical priors (e.g., hierarchical models and empirical Bayes; Friston & Stephan, Reference Friston and Stephan2007) or that bypass the evaluation of the likelihood function (e.g., approximate Bayesian computation: Beaumont, Reference Beaumont2010; likelihood-free inference: Papamakarios, Nalisnick, Rezende, Mohamed, & Lakshminarayanan, Reference Papamakarios, Nalisnick, Rezende, Mohamed and Lakshminarayanan2021; simulation-based inference: Cranmer, Brehmer, & Louppe, Reference Cranmer, Brehmer and Louppe2020)?

Second, it should be noted that the “meta-learning” feature is not exclusive of meta-learned models in machine learning, but it can be achieved using hierarchical Bayesian models (Grant, Finn, Levine, Darrell, & Griffiths, Reference Grant, Finn, Levine, Darrell and Griffiths2018; Griffiths et al., Reference Griffiths, Callaway, Chang, Grant, Krueger and Lieder2019; Kemp, Perfors, & Tenenbaum, Reference Kemp, Perfors and Tenenbaum2007; Li, Callaway, Thompson, Adams, & Griffiths, Reference Li, Callaway, Thompson, Adams and Griffiths2023). Hence, if meta-learning is taken as an argument to enable computational models to face large world problems, it cannot be used as an argument in favor of meta-learned models over hierarchical Bayesian inference.

Putting together our first two concerns, we think that a more fair comparison between meta-learned models (as defined in the target article) and hierarchical (approximate) Bayesian models would have been necessary to assert that meta-learned models contain “unique” features to address large world problems.

A further concern regards meta-learning as a solution for large world problems. Following Binmore (Reference Binmore2007), the distinction between small and large worlds can be interpreted as making decisions under risk or uncertainty, respectively. In the first case, decision makers know all contingencies of the problem and fully apply the Bayes’ rule to make the optimal decision. Large world problems are situations characterized by uncertainty about the causes and the likelihood of the events. In other terms, large world problems can be conceived as situations in which environmental assumptions previously acquired do not hold. However, if meta-learned models need to be retrained when environmental assumptions differ from the training, it follows that the use of meta-learned models can be justified only in small worlds, where previous knowledge can be used to make choices.

Finally, it should be highlighted that the target article grounds meta-learned models on the rational analysis framework (Anderson, Reference Anderson1991) given their property of approximate Bayes optimal solutions. However, Savage's and Binmore's argument was that there is no special justification for rational Bayesian solutions to large world problems. In our opinion, if one wants to hold with this rational perspective, neither Bayesian nor meta-learned models can be considered idoneous to model decision making under uncertainty. However, a possible way out of this impasse can come from psychological and cognitive research fields that have investigated decision making under uncertainty. Theoretical frameworks like the free-energy principle (Friston et al., Reference Friston, Da Costa, Sajid, Heins, Ueltzhöffer, Pavliotis and Parr2023) or reinforcement learning (Dimitrakakis & Ortner, Reference Dimitrakakis and Ortner2022; Kochenderfer, Reference Kochenderfer2015) have investigated how learning under uncertainty occurs and it is used to construct beliefs that guide decisions in situations where causes of the event are unknown. In our opinion, implementing ideas from these frameworks in the models can be a promising way to solve large world problems.

In conclusion, we do not think that Binz et al. have provided convincing support for the claim “The ability to construct Bayes-optimal learning algorithms for large world problems is a unique feature of the meta-learning framework.” We suggest that grounding on the rational analysis of cognition framework is not sufficient for modeling decisions in large worlds, and that exploring and integrating other theoretical frameworks could offer valuable insights to advance their research program.

Financial support

This research received no specific grant from any funding agency, commercial or not-for-profit sectors.

Competing interest

None.

References

Anderson, J. R. (1991). Is human cognition adaptive? Behavioral and Brain Sciences, 14(3), 471–485.CrossRef Google Scholar

Beaumont, M. A. (2010). Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics, 41, 379–406.CrossRef Google Scholar

Binmore, K. (2007). Rational decisions in large worlds. Annales d'Economie et de Statistique, 86, 25–41.CrossRef Google Scholar

Cranmer, K., Brehmer, J., & Louppe, G. (2020). The frontier of simulation-based inference. Proceedings of the National Academy of Sciences, 117(48), 30055–30062.CrossRef Google Scholar PubMed

Dimitrakakis, C., & Ortner, R. (2022). Decision making under uncertainty and reinforcement learning: Theory and algorithms (Vol. 223). Springer Nature.CrossRef Google Scholar

Friston, K. J., Da Costa, L., Sajid, N., Heins, C., Ueltzhöffer, K., Pavliotis, G. A., & Parr, T. (2023). The free energy principle made simpler but not too simple. Physics Reports, 1024, 1–29.CrossRef Google Scholar

Friston, K. J., & Stephan, K. E. (2007). Free-energy and the brain. Synthese, 159, 417–458.CrossRef Google Scholar PubMed

Grant, E., Finn, C., Levine, S., Darrell, T., & Griffiths, T. (2018). Recasting gradient-based meta-learning as hierarchical Bayes. arXiv preprint arXiv:1801.08930.Google Scholar

Griffiths, T. L., Callaway, F., Chang, M. B., Grant, E., Krueger, P. M., & Lieder, F. (2019). Doing more with less: Meta-reasoning and meta-learning in humans and machines. Current Opinion in Behavioral Sciences, 29, 24–30.CrossRef Google Scholar

Kemp, C., Perfors, A., & Tenenbaum, J. B. (2007). Learning overhypotheses with hierarchical Bayesian models. Developmental Science, 10(3), 307–321.CrossRef Google Scholar PubMed

Kochenderfer, M. J. (2015). Decision making under uncertainty: Theory and application. MIT press.CrossRef Google Scholar

Li, M. Y., Callaway, F., Thompson, W. D., Adams, R. P., & Griffiths, T. L. (2023). Learning to learn functions. Cognitive Science, 47(4), e13262.CrossRef Google Scholar PubMed

Lucas, C. G., Griffiths, T. L., Williams, J. J., & Kalish, M. L. (2015). A rational model of function learning. Psychonomic Bulletin & Review, 22(5), 1193–1215.CrossRef Google Scholar PubMed

Papamakarios, G., Nalisnick, E., Rezende, D. J., Mohamed, S., & Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. The Journal of Machine Learning Research, 22(1), 2617–2680.Google Scholar

Savage, L. J. (1972). The foundations of statistics. Courier Corporation.Google Scholar