Bowers et al. report several lines of evidence challenging the alleged similarities between deep neural network (DNN) models of visual recognition and their biological counterparts. However, human visual experience is not limited to visual recognition. In addition to the case of visual illusion presented by Bowers et al., it is important for models of the human visual system to consider a range of other visual experiences, including visual hallucinations, dreams, and mental imagery. For example, most of us can “visualize” objects in their absence, by engaging in visual mental imagery. Using partially shared neural machinery used for visual perception, visual mental imagery allows us to make predictions based on past experiences, imagine future possibilities, and simulate the possible outcomes of our decisions. Our commentary focuses on these relationships and is structured into four key points.
First, shared neural substrates of visual perception and visual mental imagery include high-level visual regions in the ventral temporal cortex (Bartolomeo, Hajhajate, Liu, & Spagna, Reference Bartolomeo, Hajhajate, Liu and Spagna2020; Spagna, Hajhajate, Liu, & Bartolomeo, Reference Spagna, Hajhajate, Liu and Bartolomeo2021). In the absence of visual input, these regions are activated top-down by other systems, such as the semantic system and the frontoparietal attention networks. Bowers et al. highlighted the challenge of modeling top-down activity with feedforward DNNs. It is currently believed that the visual system relies on distinct feedback signals to cortical layers and exhibits individual temporal dynamics for different visual experiences. In particular, visual stimulation modulates activities in mid-layers, while contextual information or illusory content feedbacks to superficial layers, and visual imagery feedbacks to deeper cortical layers (Bergmann, Morgan, & Muckli, Reference Bergmann, Morgan and Muckli2019; Muckli et al., Reference Muckli, De Martino, Vizioli, Petro, Smith, Ugurbil and Yacoub2015). Visual imagery exhibits temporal overlap with perceptual processing during late stages of processing (Dijkstra, Mostert, Lange, Bosch, & van Gerven, Reference Dijkstra, Mostert, Lange, Bosch and van Gerven2018), likely corresponding to activity in the ventral temporal cortex but not in the early visual cortex (Spagna et al., Reference Spagna, Hajhajate, Liu and Bartolomeo2021). In contrast, patients with Charles Bonnet hallucinations show a gradual increase in activity in the early visual cortex, which then gradually decreases as it moves further along the visual hierarchy (Hahamy, Wilf, Rosin, Behrmann, & Malach, Reference Hahamy, Wilf, Rosin, Behrmann and Malach2021).
Second, evidence from neuropsychology, neuroimaging, and direct cortical stimulation suggests striking differences in the activity of the ventral temporal cortex in the two hemispheres when processing visual information (Liu, Spagna, & Bartolomeo, Reference Liu, Spagna and Bartolomeo2022b). While direct cortical electrical stimulation tends to produce visual hallucinatory experiences predominantly when applied to the right temporal lobe, there is a strong lateralization to the left hemisphere for voluntary visual mental imagery. These asymmetries could potentially stem from particular hemispheric networks' predispositions toward constructing mental models of the external environment or verifying them through real-world testing (Bartolomeo & Seidel Malkinson, Reference Bartolomeo and Seidel Malkinson2022). After unilateral brain strokes, in some cases the healthy hemisphere can compensate for the visual deficit (Bartolomeo & Thiebaut de Schotten, Reference Bartolomeo and Thiebaut de Schotten2016). At present, DNN models do not incorporate either hemispheric asymmetries or the potential reorganization of these asymmetries following a stroke.
Third, some otherwise neurotypical individuals show unusually weak or strong visual mental imagery (aphantasia and hyperphantasia) (Keogh, Pearson, & Zeman, Reference Keogh, Pearson and Zeman2021; Milton et al., Reference Milton, Fulford, Dance, Gaddum, Heuerman-Williamson, Jones and Zeman2021). Aphantasic individuals perform visual imagery and visual perceptual tasks with similar accuracy than typical imagers, but with slower response times (Liu & Bartolomeo, Reference Liu and Bartolomeo2023). Consistent with these behavioral results, ultra-high field fMRI shows similar activation patterns between typical imagers and individuals with congenital aphantasia (Liu et al., Reference Liu, Zhan, Hajhajate, Spagna, Dehaene, Cohen and Bartolomeo2023). The fusiform imagery node, a high-level visual region in the left-hemisphere ventral temporal cortex (Spagna et al., Reference Spagna, Hajhajate, Liu and Bartolomeo2021), coactivates with dorsolateral frontoparietal networks in typical imagers, but is functionally isolated from these networks in aphantasic individuals during both imagery and perception. These findings suggest that high-level visual information in the ventral cortical stream is not sufficient to generate a conscious visual experience, and that a functional disconnection from frontoparietal networks may be responsible for the lack of experiential content in visual mental imagery in aphantasic individuals.
Fourth, in line with the previous point on the importance of frontoparietal networks, the way we subjectively experience both perceptions and mental images relies heavily on the interaction with other cognitive processes, such as attention and visual working memory. Despite their importance, these factors are not taken into account in DNN modeling. A recent study using human intracerebral recordings and single-layer recurrent neural network modeling found that the dynamic interactions between specific frontoparietal attentional networks and high-level visual areas play a crucial role in conscious visual perception (Liu et al., Reference Liu, Bayle, Spagna, Sitt, Bourgeois, Lehongre and Bartolomeo2023).
This evidence from the biological human brain can inspire future developments of DNNs in simulating the cognitive architecture of human visual experience. Generative adversarial networks may be promising candidates to drive these efforts forward. For instance, imagery mechanisms could act as the generator of quasi-perceptual experiences, while reality monitoring could serve as the discriminator to distinguish between sensory inputs from real or imagined sources (Gershman, Reference Gershman2019; Lau, Reference Lau2019). Recent studies investigated involuntary visual experiences using generative neural network models, such as in memory replay (van de Ven, Siegelmann, & Tolias, Reference van de Ven, Siegelmann and Tolias2020), intrusive imagery (Cushing et al., Reference Cushing, Dawes, Hofmann, Lau, LeDoux and Taschereau-Dumouchel2023), and adversarial dreaming (Deperrois, Petrovici, Senn, & Jordan, Reference Deperrois, Petrovici, Senn and Jordan2022). Regarding voluntary visual mental imagery, some key strategies may involve modeling the retrieval process of representations pertaining to semantic information and visual features (Liu et al., Reference Liu, Zhan, Hajhajate, Spagna, Dehaene, Cohen and Bartolomeo2023), and incorporating biologically inspired recurrence in visual imagery processing (Lindsay, Mrsic-Flogel, & Sahani, Reference Lindsay, Mrsic-Flogel and Sahani2022).
In conclusion, we suggest that shared representations in visual cortex are not the primary factor in generating and distinguishing distinct visual experiences. Rather, the temporal dynamics and functional connectivity of the process are essential. Current DNNs are inadequate to accurately model the complexity of human visual experience. Biologically inspired generative adversarial networks may provide novel ways of simulating the varieties of human visual experience.
Bowers et al. report several lines of evidence challenging the alleged similarities between deep neural network (DNN) models of visual recognition and their biological counterparts. However, human visual experience is not limited to visual recognition. In addition to the case of visual illusion presented by Bowers et al., it is important for models of the human visual system to consider a range of other visual experiences, including visual hallucinations, dreams, and mental imagery. For example, most of us can “visualize” objects in their absence, by engaging in visual mental imagery. Using partially shared neural machinery used for visual perception, visual mental imagery allows us to make predictions based on past experiences, imagine future possibilities, and simulate the possible outcomes of our decisions. Our commentary focuses on these relationships and is structured into four key points.
First, shared neural substrates of visual perception and visual mental imagery include high-level visual regions in the ventral temporal cortex (Bartolomeo, Hajhajate, Liu, & Spagna, Reference Bartolomeo, Hajhajate, Liu and Spagna2020; Spagna, Hajhajate, Liu, & Bartolomeo, Reference Spagna, Hajhajate, Liu and Bartolomeo2021). In the absence of visual input, these regions are activated top-down by other systems, such as the semantic system and the frontoparietal attention networks. Bowers et al. highlighted the challenge of modeling top-down activity with feedforward DNNs. It is currently believed that the visual system relies on distinct feedback signals to cortical layers and exhibits individual temporal dynamics for different visual experiences. In particular, visual stimulation modulates activities in mid-layers, while contextual information or illusory content feedbacks to superficial layers, and visual imagery feedbacks to deeper cortical layers (Bergmann, Morgan, & Muckli, Reference Bergmann, Morgan and Muckli2019; Muckli et al., Reference Muckli, De Martino, Vizioli, Petro, Smith, Ugurbil and Yacoub2015). Visual imagery exhibits temporal overlap with perceptual processing during late stages of processing (Dijkstra, Mostert, Lange, Bosch, & van Gerven, Reference Dijkstra, Mostert, Lange, Bosch and van Gerven2018), likely corresponding to activity in the ventral temporal cortex but not in the early visual cortex (Spagna et al., Reference Spagna, Hajhajate, Liu and Bartolomeo2021). In contrast, patients with Charles Bonnet hallucinations show a gradual increase in activity in the early visual cortex, which then gradually decreases as it moves further along the visual hierarchy (Hahamy, Wilf, Rosin, Behrmann, & Malach, Reference Hahamy, Wilf, Rosin, Behrmann and Malach2021).
Second, evidence from neuropsychology, neuroimaging, and direct cortical stimulation suggests striking differences in the activity of the ventral temporal cortex in the two hemispheres when processing visual information (Liu, Spagna, & Bartolomeo, Reference Liu, Spagna and Bartolomeo2022b). While direct cortical electrical stimulation tends to produce visual hallucinatory experiences predominantly when applied to the right temporal lobe, there is a strong lateralization to the left hemisphere for voluntary visual mental imagery. These asymmetries could potentially stem from particular hemispheric networks' predispositions toward constructing mental models of the external environment or verifying them through real-world testing (Bartolomeo & Seidel Malkinson, Reference Bartolomeo and Seidel Malkinson2022). After unilateral brain strokes, in some cases the healthy hemisphere can compensate for the visual deficit (Bartolomeo & Thiebaut de Schotten, Reference Bartolomeo and Thiebaut de Schotten2016). At present, DNN models do not incorporate either hemispheric asymmetries or the potential reorganization of these asymmetries following a stroke.
Third, some otherwise neurotypical individuals show unusually weak or strong visual mental imagery (aphantasia and hyperphantasia) (Keogh, Pearson, & Zeman, Reference Keogh, Pearson and Zeman2021; Milton et al., Reference Milton, Fulford, Dance, Gaddum, Heuerman-Williamson, Jones and Zeman2021). Aphantasic individuals perform visual imagery and visual perceptual tasks with similar accuracy than typical imagers, but with slower response times (Liu & Bartolomeo, Reference Liu and Bartolomeo2023). Consistent with these behavioral results, ultra-high field fMRI shows similar activation patterns between typical imagers and individuals with congenital aphantasia (Liu et al., Reference Liu, Zhan, Hajhajate, Spagna, Dehaene, Cohen and Bartolomeo2023). The fusiform imagery node, a high-level visual region in the left-hemisphere ventral temporal cortex (Spagna et al., Reference Spagna, Hajhajate, Liu and Bartolomeo2021), coactivates with dorsolateral frontoparietal networks in typical imagers, but is functionally isolated from these networks in aphantasic individuals during both imagery and perception. These findings suggest that high-level visual information in the ventral cortical stream is not sufficient to generate a conscious visual experience, and that a functional disconnection from frontoparietal networks may be responsible for the lack of experiential content in visual mental imagery in aphantasic individuals.
Fourth, in line with the previous point on the importance of frontoparietal networks, the way we subjectively experience both perceptions and mental images relies heavily on the interaction with other cognitive processes, such as attention and visual working memory. Despite their importance, these factors are not taken into account in DNN modeling. A recent study using human intracerebral recordings and single-layer recurrent neural network modeling found that the dynamic interactions between specific frontoparietal attentional networks and high-level visual areas play a crucial role in conscious visual perception (Liu et al., Reference Liu, Bayle, Spagna, Sitt, Bourgeois, Lehongre and Bartolomeo2023).
This evidence from the biological human brain can inspire future developments of DNNs in simulating the cognitive architecture of human visual experience. Generative adversarial networks may be promising candidates to drive these efforts forward. For instance, imagery mechanisms could act as the generator of quasi-perceptual experiences, while reality monitoring could serve as the discriminator to distinguish between sensory inputs from real or imagined sources (Gershman, Reference Gershman2019; Lau, Reference Lau2019). Recent studies investigated involuntary visual experiences using generative neural network models, such as in memory replay (van de Ven, Siegelmann, & Tolias, Reference van de Ven, Siegelmann and Tolias2020), intrusive imagery (Cushing et al., Reference Cushing, Dawes, Hofmann, Lau, LeDoux and Taschereau-Dumouchel2023), and adversarial dreaming (Deperrois, Petrovici, Senn, & Jordan, Reference Deperrois, Petrovici, Senn and Jordan2022). Regarding voluntary visual mental imagery, some key strategies may involve modeling the retrieval process of representations pertaining to semantic information and visual features (Liu et al., Reference Liu, Zhan, Hajhajate, Spagna, Dehaene, Cohen and Bartolomeo2023), and incorporating biologically inspired recurrence in visual imagery processing (Lindsay, Mrsic-Flogel, & Sahani, Reference Lindsay, Mrsic-Flogel and Sahani2022).
In conclusion, we suggest that shared representations in visual cortex are not the primary factor in generating and distinguishing distinct visual experiences. Rather, the temporal dynamics and functional connectivity of the process are essential. Current DNNs are inadequate to accurately model the complexity of human visual experience. Biologically inspired generative adversarial networks may provide novel ways of simulating the varieties of human visual experience.
Financial support
J. L. received funding from Dassault Systèmes. The work of P. B. is supported by the Agence Nationale de la Recherche through ANR-16-CE37-0005 and ANR-10-IAIHU-06, and by the Fondation pour la Recherche sur les AVC through FR-AVC-017.
Competing interest
None.