Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision

James J. DiCarlo; Daniel L. K. Yamins; Michael E. Ferguson; Evelina Fedorenko; Matthias Bethge; Tyler Bonnen; Martin Schrimpf

doi:10.1017/S0140525X23001607

Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision

Published online by Cambridge University Press: 06 December 2023

James J. DiCarlo

Daniel L. K. Yamins ,

Michael E. Ferguson ,

Tyler Bonnen and

James J. DiCarlo: Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Daniel L. K. Yamins: Affiliation:
Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA yamins@stanford.edu bonnen@stanford.edu; http://neuroailab.stanford.edu/research.html
Michael E. Ferguson: Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Evelina Fedorenko: Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/
Matthias Bethge: Affiliation:
Tübingen AI Center, University of Tübingen, Tübingen, Germany matthias.bethge@bethgelab.org; https://bethgelab.org/
Tyler Bonnen: Affiliation:
Wu Tsai Neurosciences Institute, Stanford University, Stanford, CA, USA yamins@stanford.edu bonnen@stanford.edu; http://neuroailab.stanford.edu/research.html
Martin Schrimpf: Affiliation:
Dept. of Brain and Cognitive Sciences, Quest for Intelligence, and McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA dicarlo@mit.edu; https://dicarlolab.mit.edu mferg@mit.edu evelina9@mit.edu; https://evlab.mit.edu/ msch@mit.edu; https://mschrimpf.com/ École polytechnique fédérale de Lausanne, Lausanne, Switzerland

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In the target article, Bowers et al. dispute deep artificial neural network (ANN) models as the currently leading models of human vision without producing alternatives. They eschew the use of public benchmarking platforms to compare vision models with the brain and behavior, and they advocate for a fragmented, phenomenon-specific modeling approach. These are unconstructive to scientific progress. We outline how the Brain-Score community is moving forward to add new model-to-human comparisons to its community-transparent suite of benchmarks.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 46 , 2023 , e390

DOI: https://doi.org/10.1017/S0140525X23001607 [Opens in a new window]
Copyright: Copyright © The Author(s), 2023. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Baker, N., & Elder, J. H. (2022). Deep learning models fail to capture the configural nature of human shape perception. iScience, 25(9), 104913.CrossRef Google Scholar PubMed

Bowers, J. S., & Jones, K. W. (2007). Detecting objects is easier than categorizing them. Quarterly Journal of Experimental Psychology, 61, 552–557.CrossRef Google Scholar

Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F. A., & Brendel, W. (2021). Partial success in closing the gap between human and machine vision. Advances in Neural Information Processing Systems, 34, 23885–23899.Google Scholar

Mack, M. L., Gauthier, I., Sadr, J., & Palmeri, T. J. (2008). Object detection and basic-level categorization: Sometimes you know it is there before you know what it is. Psychonomic Bulletin & Review, 15(1), 28–35.CrossRef Google Scholar PubMed

Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. Visual information processing. Academic Press.CrossRef Google Scholar

Puebla, G., & Bowers, J. S. (2022). Can deep convolutional neural networks support relational reasoning in the same-different task? Journal of Vision, 22(10), 1–18.CrossRef Google Scholar PubMed

Rajalingham, R., Issa, E. B., Bashivan, P., Kar, K., Schmidt, K., & DiCarlo, J. J. (2018). Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks. Journal of Neuroscience, 38(33), 7255–7269.CrossRef Google Scholar PubMed

Saarela, T. P., Sayim, B., Westheimer, G., & Herzog, M. H. (2009). Global stimulus configuration modulates crowding. Journal of Vision, 9(2), 5.CrossRef Google Scholar PubMed

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N. J., Rajalingham, R., Issa, E. B., … DiCarlo, J. J. (2018). Brain-Score: Which artificial neural network for object recognition is most brain-like? bioRxiv, 407007.Google Scholar

Schrimpf, M., Kubilius, J., Lee, M. J., Murty, R., Apurva, N., Ajemian, R., & DiCarlo, J. J. (2020). Integrative benchmarking to advance neurally mechanistic models of human intelligence. Neuron, 108(3), 413–423.CrossRef Google Scholar PubMed

Spoerer, C. J., Kietzmann, T. C., Mehrer, J., Charest, I., & Kriegeskorte, N. (2020). Recurrent neural networks can explain flexible trading of speed and accuracy in biological vision. PLoS Computational Biology, 16(10), e1008215.CrossRef Google Scholar PubMed

Zhang, C., Bengio, S., Hardt, M., Recht, B., & Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115.CrossRef Google Scholar

Let's move forward: Image-computable models and a common model evaluation scheme are prerequisites for a scientific understanding of human vision – CORRIGENDUM