Where do the hypotheses come from? Data-driven learning in science and the brain

Barton L. Anderson; Katherine R. Storrs; Roland W. Fleming

doi:10.1017/S0140525X23001565

Where do the hypotheses come from? Data-driven learning in science and the brain

Published online by Cambridge University Press: 06 December 2023

Barton L. Anderson

Katherine R. Storrs and

Roland W. Fleming

Show author details

Barton L. Anderson: Affiliation:
School of Psychology, University of Sydney, Sydney, Australia barton.anderson@sydney.edu.au
Katherine R. Storrs: Affiliation:
Department of Psychology, University of Auckland, Auckland, New Zealand katherine.storrs@gmail.com
Roland W. Fleming: Affiliation:
Department of Psychology, Justus Liebig University of Giessen, Giessen, Germany roland.w.fleming@psychol.uni-giessen.de Center for Mind, Brain and Behavior, Universities of Marburg and Giessen, Giessen, Germany

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Everyone agrees that testing hypotheses is important, but Bowers et al. provide scant details about where hypotheses about perception and brain function should come from. We suggest that the answer lies in considering how information about the outside world could be acquired – that is, learned – over the course of evolution and development. Deep neural networks (DNNs) provide one tool to address this question.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e386

DOI: https://doi.org/10.1017/S0140525X23001565 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Bowers et al. argue that we need to go beyond hypothesis-blind benchmarking in assessing models of vision. On these points we agree: Benchmarking massively complex models on small and arbitrary brain and behavioural datasets is unlikely to yield satisfying outcomes. The space of models and stimuli is too vast, many models score similarly (e.g., Storrs et al., Reference Storrs, Kietzmann, Walther, Mehrer and Kriegeskorte2021a), even bad models can score highly in constrained settings, and the approach gives little insight into what model get right or wrong about biological vision. Bowers et al. advocate for the importance of hypothesis-driven research, which is something with which we also agree. However, this is also fraught with challenges for which the authors offer little in the way of solutions. Where do principled hypotheses come from? What theoretical considerations should constrain the hypotheses we consider? We suggest that the different answers that have been proposed to such questions can provide insight into the role that deep neural networks (DNNs) might play in understanding perception and the brain more broadly.

One approach is to begin with what our visual systems seem to do and work backwards to “reverse engineer” the brain. This approach was advocated by Marr, and has been taken up by Bayesian approaches that treat visual processes a set of “natural tasks.” The idea is that natural selection shaped our brains to approximate “ideal observers” of some set of environmental properties. But natural selection can only act retrospectively; it provides no insight into the genesis of the “options” that it “chooses” between. In the case of vision, it could select between brains that were better at extracting some world properties than others, but it provides no insight into how brains discovered that those environmental properties exist. Something more is needed to explain how the “natural tasks” the brain putatively solves were discovered.

It is here we believe the idea of “data-driven” processes plays a fundamental role. The only known mechanism for getting knowledge about the world into our heads is through our senses. As our brains were not given a list of scene variables they need to estimate, they had to discover properties of the world based on the “diet” of images they experienced over the course of evolution and development. This simple (and seemingly tautological) assertion has a profound theoretical ramification: It implies that anything our brains extract about the world must be based on information contained in, or derivable from, the input. Two routes to defining “principled hypotheses” about visual function follow from this: (1) we should identify what that information is, that is, explore how what we experience about the world relates to what exists in the input; and (2) we should identify how sensitivity to that information was acquired (learned) over the course of evolution and development, that is, explore the mechanisms that underlie sensitivity to these quantities. These two ideas are not independent. Understanding how information can be learned from images can provide insight into what is learned, and understanding what information is used can constrain attempts to construct a learning process that could become sensitive to it.

To ground this idea, consider an example from our recent work. Psychophysical studies revealed that the subjective perception of surface gloss depends not only on the physical specular reflectance of surfaces, but also on other, physically independent scene properties such as shape and illumination (Ho, Landy, & Maloney, Reference Ho, Landy and Maloney2008). Further experiments revealed that that these perceptual errors were caused by differences in the spatial structure and distribution of specular reflections in the image, which relates our experience of a world property (gloss) to properties of images (Marlow, Kim, & Anderson, Reference Marlow, Kim and Anderson2012). We then showed that a system trained to recover “ground truth” (i.e., trained to learn a mapping between images and gloss) failed to predict human judgements. In contrast, unsupervised DNNs, designed to summarise and predict properties of the input, learned representations exhibiting the same pattern of successes and errors in perceived gloss as humans (Storrs, Anderson, & Fleming, Reference Storrs, Anderson and Fleming2021b). In essence, the unsupervised DNN partially – but imperfectly – disentangled the different scene variables (here, gloss, shape, and illumination) in similar ways as our visual system. DNNs thus provided insight into how such illusions could result from an (imperfect) learning process.

How might we go about discovering the information in images that our visual systems use, and what role might DNNs play in hypothesis formation and/or model development? Supervised DNNs of the variety Bowers et al. focus on are inherently teleological; they start with a goal, and coerce the system towards that goal through explicit training of the distinctions it wants the system to make (Yamins & DiCarlo, Reference Yamins and DiCarlo2016). Unsupervised or self-supervised DNNs are techniques for finding similarities and differences between general features based on statistical properties of the input. One view of such networks is that they are generalised “covariance detectors,” which are driven largely by how different image properties do or do not covary. The idea that the visual system derives information about scene properties from the way that different types of image structure covary has provided recent leverage in understanding how the brain extracts the shape and material properties of surfaces (Anderson & Marlow, Reference Anderson and Marlow2023; Marlow & Anderson, Reference Marlow and Anderson2021; Marlow, Mooney, & Anderson, Reference Marlow, Mooney and Anderson2019).

What role might DNNs take in evaluating different computational models of vision? We agree that the psychology and psychophysics literatures provide a wealth of excellent starting points for testing candidate computational models of vision. However, human ingenuity is not always well suited to designing stimuli or experiments that can differentiate between multiple computationally complex alternative models – and such complexity will be unavoidable if we seek to capture human vision in even broad strokes. Therefore, we also see value in using automated selection of complex stimuli to maximally differentiate complex models (e.g., Golan, Raju, & Kriegeskorte, Reference Golan, Raju and Kriegeskorte2020; Wang & Simoncelli, Reference Wang and Simoncelli2008). More broadly, deep learning provides a means to instantiate different hypotheses about how vision is acquired, and the impacts this has on the mature visual system. There is a lot more to deep learning than benchmarking.

Financial support

Funding (DP210102218) for this work was from the Australian Research Council to B. L. A.; by a Marsden Fast Start grant (MFP-UOA2109) from the Royal Society of New Zealand to K. R. S.; and by the DFG (222641018-SFB/TRR-135 TP C1) and Research Cluster “The Adaptive Mind,” funded by the Hessian Ministry for Higher Education, Research, Science and the Arts to R. W. F.

Competing interest

None.

References

Anderson, B. L., & Marlow, P. J. (2023). Perceiving the shape and material properties of 3D surfaces. Trends in Cognitive Sciences, 27(1), 98–110. doi:10.1016/j.tics.2022.10.005CrossRef Google Scholar PubMed

Golan, T., Raju, P. C., & Kriegeskorte, N. (2020). Controversial stimuli: Pitting neural networks against each other as models of human cognition. Proceedings of the National Academy of Sciences of the United States of America, 117(47), 29330–29337.CrossRef Google Scholar PubMed

Ho, Y. X., Landy, M. S., & Maloney, L. T. (2008). Conjoint measurement of gloss and surface texture. Psychological Science, 19, 196–204.CrossRef Google Scholar PubMed

Marlow, P., & Anderson, B. (2021). The cospecification of the shape and material properties of light permeable materials. Proceedings of the National Academy of Sciences of the United States of America, 118(14), e2024798118.CrossRef Google Scholar PubMed

Marlow, P., Kim, J., & Anderson, B. (2012). The perception and misperception of specular surface reflectance. Current Biology, 22(20), 1–5.CrossRef Google Scholar PubMed

Marlow, P., Mooney, S., & Anderson, B. (2019). Photogeometric cues to perceived surface shading. Current Biology, 29(2), 306–311.CrossRef Google Scholar PubMed

Storrs, K. R., Anderson, B. L., & Fleming, R. W. (2021b). Unsupervised learning predicts human perception and misperception of gloss. Nature Human Behavior, 5, 1402–1417. https://doi.org/10.1038/s41562-021-01097-6CrossRef Google Scholar PubMed

Storrs, K. R., Kietzmann, T. C., Walther, A., Mehrer, J., & Kriegeskorte, N. (2021a). Diverse deep neural networks all predict human inferior temporal cortex well, after training and fitting. Journal of Cognitive Neuroscience, 33(10), 2044–2064.Google Scholar PubMed

Wang, Z., & Simoncelli, E. P. (2008). Maximum differentiation (MAD) competition: A methodology for comparing computational models of perceptual quantities. Journal of Vision, 8(12), 8.CrossRef Google Scholar PubMed

Yamins, D., & DiCarlo, J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19, 356–365. https://doi.org/10.1038/nn.4244CrossRef Google Scholar PubMed