In keeping with what Turing proposed for the imitation game (Turing, Reference Turing1950), a good brain-computational model (Kriegeskorte & Douglas, Reference Kriegeskorte and Douglas2018) would not be the one that performs a particular task with equal or greater accuracy than a human being, but rather the one which would be indistinguishable from a human being vis-à-vis input and output. Psychophysics, interestingly, is also about input and output with the brain as black-box in between (Read, Reference Read2015). Bowers et al. provide a comprehensive presentation of the incongruence between deep neural networks (DNNs) and the visual brain, but fails to note this relevant connection of psychophysics to neuroscience for brain-computational modeling (Read, Reference Read2015).
Psychophysics is “the analysis of perceptual processes by studying the effect on a subject's experience or behavior of systematically varying the properties of a stimulus along one or more physical dimensions” (Bruce, Green, & Georgeson, Reference Bruce, Green and Georgeson2003). The psychophysics stimulus for vision can be an image or video, and DNN, an information-processing system, may model the subject's response to the stimulus using supervised learning. David Marr had proposed that an information processing system should be understood at three levels: computational, algorithmic, and implementation. The psychophysics task describes the computational level problem, a DNN that performs the same task in silica would represent the algorithmic level, and the electrophysiological or fMRI data obtained during the task will be a by-product of the implementation of the algorithm in the biological brain. If the DNN is considered for an equivalent mapping between input and output as in a psychophysics experiment, then the inputs can be represented by a tensor, whether it is an image, video, sound signal, or a spatially invariant visual stimulus like the flicker; the output would also have a numerical representation which, in case of psychophysics experiments, could be some classification, perceived brightness, color, shape, size, motion, intensity at a particular location in the input signal, or a comparison between two of those perceived sensations at different locations of the stimulus, separated by space or time or both. The algorithm used to transform the stimulus input to output will not be evident from psychophysics experiments, but DNNs can construct that algorithm without its exact knowledge for the programmer.
The dataset can be prepared by manipulating physical parameters associated with the stimulus and getting the subject response for each of the stimuli. There can be some subjective differences between the psychophysics data of human subjects for the same stimuli (Read, Reference Read2015). So, it will be a better strategy to train and test a DNN on the psychophysics data of the same subject. Kubota, Hiyama, and Inami (Reference Kubota, Hiyama and Inami2021) have used psychophysics data obtained from brightness illusions to train DNNs. Kubota et al. (Reference Kubota, Hiyama and Inami2021) have shown that it is possible to make comparisons between human perception on the one hand, and the output with the said methodology, on the other. DNNs may also be tested on a stimulus, completely different from the one it was trained on, if its output layer is of similar representation to that of the new stimulus. Recently, Ghosh and Chandran (Reference Ghosh and Chandran2021) proposed such a technique for flicker stimulus. The intermediate outputs of a DNN can be compared with the brain electrophysiological signals as done by Zipser and Andersen (Reference Zipser and Andersen1988), and more recently by Chandran and Ghosh (Reference Chandran and Ghosh2021, Reference Chandran and Ghosh2022) with EEG. We argue that more testable models can be constructed by training on less computationally intensive tasks than tasks like object classification into thousands of classes. For instance, a convolutional neural network (CNN) trained for low-level visual tasks gets deceived by brightness and color illusions (Gomez-Villa, Martín, Vazquez-Corral, Bertalmío, & Malo, Reference Gomez-Villa, Martín, Vazquez-Corral, Bertalmío and Malo2020). DNNs have also been put forth to solve tasks used in experimental psychology like Raven's progressive matrices (Jahrens & Martinetz, Reference Jahrens and Martinetz2020). New network models, different from the engineering goal-oriented image classification DNNs, could be constructed for the purpose as was previously done for finding head-centered coordinates of external objects by monkey brain by Zipser and Andersen (Reference Zipser and Andersen1988). It could be easier to make correlations between outputs of intermediate layers of a neural network with fewer neurons and layers with brain signals than complex networks.
Bowers et al. mentions that DNNs trained on ImageNet do not encode three-dimensional (3D) features of objects or their depth as opposed to human vision. The abovementioned DNNs are trained with datasets prepared from cameras with monocular vision. But the mammalian brain gets information from the two eyes and it is known that human subjects with one eye are not so efficient with depth perception (Westlake, Reference Westlake2001). Robots with stereo cameras making use of DNNs are able to do tasks like calculating position of detected fruit from stereo cameras (Onishi et al., Reference Onishi, Yoshida, Kurita, Fukao, Arihara and Iwai2019). Stereo vision can enable autonomous driving vehicles to do tasks like object detection, 3D information acquisition, and depth perception (Fan, Wang, Junaid Bocus, & Pitas, Reference Fan, Wang, Junaid Bocus and Pitas2023). The mammalian brain had input from two eyes throughout the course of its evolutionary history. So training DNNs using stereo camera data might be needed to develop the equivalents of many circuits in the brain.
To conclude, psychophysics with DNNs could be used to construct many of the smaller agents that compose the human mind as proposed by Minsky (Reference Minsky1988). Vision agents that compose the mind need to be likewise constructed via DNNs, which may be associated with fundamental activities like brightness perception, motion detection, depth perception, or even less intelligent activities than that, in the parallel visual pathways. Neural networks for more complex tasks can be built with a combination of smaller DNNs using shared layers, or by using output from some layers of a DNN as input for layers of another DNN.
In keeping with what Turing proposed for the imitation game (Turing, Reference Turing1950), a good brain-computational model (Kriegeskorte & Douglas, Reference Kriegeskorte and Douglas2018) would not be the one that performs a particular task with equal or greater accuracy than a human being, but rather the one which would be indistinguishable from a human being vis-à-vis input and output. Psychophysics, interestingly, is also about input and output with the brain as black-box in between (Read, Reference Read2015). Bowers et al. provide a comprehensive presentation of the incongruence between deep neural networks (DNNs) and the visual brain, but fails to note this relevant connection of psychophysics to neuroscience for brain-computational modeling (Read, Reference Read2015).
Psychophysics is “the analysis of perceptual processes by studying the effect on a subject's experience or behavior of systematically varying the properties of a stimulus along one or more physical dimensions” (Bruce, Green, & Georgeson, Reference Bruce, Green and Georgeson2003). The psychophysics stimulus for vision can be an image or video, and DNN, an information-processing system, may model the subject's response to the stimulus using supervised learning. David Marr had proposed that an information processing system should be understood at three levels: computational, algorithmic, and implementation. The psychophysics task describes the computational level problem, a DNN that performs the same task in silica would represent the algorithmic level, and the electrophysiological or fMRI data obtained during the task will be a by-product of the implementation of the algorithm in the biological brain. If the DNN is considered for an equivalent mapping between input and output as in a psychophysics experiment, then the inputs can be represented by a tensor, whether it is an image, video, sound signal, or a spatially invariant visual stimulus like the flicker; the output would also have a numerical representation which, in case of psychophysics experiments, could be some classification, perceived brightness, color, shape, size, motion, intensity at a particular location in the input signal, or a comparison between two of those perceived sensations at different locations of the stimulus, separated by space or time or both. The algorithm used to transform the stimulus input to output will not be evident from psychophysics experiments, but DNNs can construct that algorithm without its exact knowledge for the programmer.
The dataset can be prepared by manipulating physical parameters associated with the stimulus and getting the subject response for each of the stimuli. There can be some subjective differences between the psychophysics data of human subjects for the same stimuli (Read, Reference Read2015). So, it will be a better strategy to train and test a DNN on the psychophysics data of the same subject. Kubota, Hiyama, and Inami (Reference Kubota, Hiyama and Inami2021) have used psychophysics data obtained from brightness illusions to train DNNs. Kubota et al. (Reference Kubota, Hiyama and Inami2021) have shown that it is possible to make comparisons between human perception on the one hand, and the output with the said methodology, on the other. DNNs may also be tested on a stimulus, completely different from the one it was trained on, if its output layer is of similar representation to that of the new stimulus. Recently, Ghosh and Chandran (Reference Ghosh and Chandran2021) proposed such a technique for flicker stimulus. The intermediate outputs of a DNN can be compared with the brain electrophysiological signals as done by Zipser and Andersen (Reference Zipser and Andersen1988), and more recently by Chandran and Ghosh (Reference Chandran and Ghosh2021, Reference Chandran and Ghosh2022) with EEG. We argue that more testable models can be constructed by training on less computationally intensive tasks than tasks like object classification into thousands of classes. For instance, a convolutional neural network (CNN) trained for low-level visual tasks gets deceived by brightness and color illusions (Gomez-Villa, Martín, Vazquez-Corral, Bertalmío, & Malo, Reference Gomez-Villa, Martín, Vazquez-Corral, Bertalmío and Malo2020). DNNs have also been put forth to solve tasks used in experimental psychology like Raven's progressive matrices (Jahrens & Martinetz, Reference Jahrens and Martinetz2020). New network models, different from the engineering goal-oriented image classification DNNs, could be constructed for the purpose as was previously done for finding head-centered coordinates of external objects by monkey brain by Zipser and Andersen (Reference Zipser and Andersen1988). It could be easier to make correlations between outputs of intermediate layers of a neural network with fewer neurons and layers with brain signals than complex networks.
Bowers et al. mentions that DNNs trained on ImageNet do not encode three-dimensional (3D) features of objects or their depth as opposed to human vision. The abovementioned DNNs are trained with datasets prepared from cameras with monocular vision. But the mammalian brain gets information from the two eyes and it is known that human subjects with one eye are not so efficient with depth perception (Westlake, Reference Westlake2001). Robots with stereo cameras making use of DNNs are able to do tasks like calculating position of detected fruit from stereo cameras (Onishi et al., Reference Onishi, Yoshida, Kurita, Fukao, Arihara and Iwai2019). Stereo vision can enable autonomous driving vehicles to do tasks like object detection, 3D information acquisition, and depth perception (Fan, Wang, Junaid Bocus, & Pitas, Reference Fan, Wang, Junaid Bocus and Pitas2023). The mammalian brain had input from two eyes throughout the course of its evolutionary history. So training DNNs using stereo camera data might be needed to develop the equivalents of many circuits in the brain.
To conclude, psychophysics with DNNs could be used to construct many of the smaller agents that compose the human mind as proposed by Minsky (Reference Minsky1988). Vision agents that compose the mind need to be likewise constructed via DNNs, which may be associated with fundamental activities like brightness perception, motion detection, depth perception, or even less intelligent activities than that, in the parallel visual pathways. Neural networks for more complex tasks can be built with a combination of smaller DNNs using shared layers, or by using output from some layers of a DNN as input for layers of another DNN.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Competing interest
None.