Almaatouq et al. provide an incisive and welcome critique of the dominant one-at-a-time paradigm. They argue for integrative studies that systematically and comprehensively explore the “universe of possible experiments” (target article, sect. 2.2, para. 1) in each domain of inquiry. While we are sympathetic to the goal, Almaatouq et al. overemphasize systematic at the expense of comprehensive.
As we see it, the core problem with the one-at-a-time approach is that it is too slow. It is not news that most studies extrapolate broadly from a miniscule sample of stimuli, subject demographics, and experimental paradigms, resulting in a long-running generalizability crisis (Clark, Reference Clark1973; Henrich, Heine, & Norenzayan, Reference Henrich, Heine and Norenzayan2010; Judd, Westfall, & Kenny, Reference Judd, Westfall and Kenny2012; Yarkoni, Reference Yarkoni2022). Even large literatures often fail to do much more than scratch the surface of possibilities (e.g., Hartshorne & Snedeker, Reference Hartshorne and Snedeker2013; Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021). It is as if we set out to explore the universe of possible experiments, but spent most of our time hanging out at the hotel pool.
In principle, Almaatouq et al.'s integrative experiment approach is ideal: Find the set of parameters that describe the universe of possible experiments and then survey systematically. Unfortunately, this requires better understanding of the phenomenon than we usually have. Indeed, the cognitive and behavioral sciences remain largely in Kuhn's preparadigmatic phase (Kuhn, Reference Kuhn2012), characterized by conflicting and incommensurate theories, each with its own set of assumptions, methods, and observations.
Let us illustrate the difficulty with an easy-to-articulate question: Are children better at learning all aspects of syntax or just certain parts? Answering this question requires comparing how quickly older and younger learners learn each component of syntax. The problem is that different theories propose radically different visions of what syntax is and how it might be subdivided. Theories differ in terms of whether syntax is governed by large numbers of highly articulated, abstract rules that combine structurally simple words; by a small number of simple rules that combine internally complex words; or by superimposed, prototype-like patterns, with no distinction between words and rules; among other possibilities (Chomsky, Reference Chomsky2014; Goldberg, Reference Goldberg1995, Reference Goldberg2009; Hopcroft, Motwani, & Ullman, Reference Hopcroft, Motwani and Ullman2001; Steedman, Reference Steedman2001). Even where theorists agree on the structure, they disagree on processing, with the same grammatical patterns subserved by different cognitive/neurological systems at different times or by different people (O'Donnell, Reference O'Donnell2015; Ullman, Reference Ullman2001).
To make matters worse, none of these are complete theories that can be applied to arbitrary stimuli. For starters, predictions depend on ancillary assumptions that are left as open empirical questions (such as the relative preference for generalizations vs. one-off computations). More problematically, different theories often prioritize explaining distinct phenomena (common or rare utterances; highly productive patterns or semi-idiomatic expressions; early child language or mature usage; similarities across languages or cross-linguistic differences); one theory may not make clear predictions about the core motivating phenomena for another, and vice versa. While we believe this reflects the difficulty of the problem, not the diligence of the researchers, the outcome is the same: There is a lot of theoretical progress lying between here and the integrative experiments proposed by Almaatouq et al.
Even our example above underestimates the problem. Almaatouq et al. focus primarily on sampling from a stimulus space, not a task space. Two of their examples (trolley problems and risky-choice scenarios) are narrowly defined paradigms for studying much broader phenomena (moral reasoning and decision making under uncertainty). Their third example (masked cueing) does involve manipulating some task parameters beyond the stimuli themselves, but remains tied to a narrowly circumscribed task.
This might be fine if we fully understood the relationship between tasks and the underlying cognitive processes, but mostly we do not. Consider, for instance, measures of cognitive control – itself one of the most thoroughly investigated constructs in cognitive psychology. There are a number of popular tasks used to study cognitive control, including the masked cuing paradigm described by Almaatouq et al. Recently, one of us directly compared cognitive control as measured by three closely related tasks: The Simon, Stroop, and flanker tasks (Erb et al., Reference Erb, Germine and Hartshorne2023). Two massive online experiments with more than 20,000 participants revealed that these three tasks show strikingly different patterns of change in performance over the lifespan and near-zero correlations. Thus, integrative studies of cognitive control need to sample not just across stimuli but also paradigms. However, it is not clear that the differences across paradigms/tasks can be easily parameterized. Indeed, advances in our fields often owe themselves to the creation of new paradigms that open up new questions or comparisons.
Perhaps we are too pessimistic. Perhaps most questions resemble trolley problems and few resemble syntax or cognitive control (though we note that one of Almaatouq et al.'s three examples actually investigated cognitive control). But we would hate to predicate moving beyond the one-at-a-time approach on the widespread feasibility of parameterization. We worry that this licenses researchers (and editors and funders) to let perfect be the enemy of better – because we can do much better.
In particular, Almaatouq et al. may have only been able to find three examples of systematic exploration of the universe of possible experiments, but comprehensive explorations abound. This includes megastudies that test large, diverse sets of stimuli (e.g., Breithaupt, Li, & Kruschke, Reference Breithaupt, Li and Kruschke2022; Brysbaert, Stevens, Mandera, & Keuleers, Reference Brysbaert, Stevens, Mandera and Keuleers2016; De Deyne, Navarro, Perfors, Brysbaert, & Storms, Reference De Deyne, Navarro, Perfors, Brysbaert and Storms2019; Hartshorne, Bonial, & Palmer, Reference Hartshorne, Bonial and Palmer2014), broad subject demographics (e.g., Bleidorn et al., Reference Bleidorn, Klimstra, Denissen, Rentfrow, Potter and Gosling2013; Hartshorne, Tenenbaum, & Pinker, Reference Hartshorne, Tenenbaum and Pinker2018; Nosek, Banaji, & Greenwald, Reference Nosek, Banaji and Greenwald2002; Riley et al., Reference Riley, Okabe, Germine, Wilmer, Esterman and DeGutis2016; Soto, John, Gosling, & Potter, Reference Soto, John, Gosling and Potter2011; Spiers, Coutrot, & Hornberger, Reference Spiers, Coutrot and Hornberger2023), or a range of related tasks (e.g., Erb, Germine, & Hartshorne, Reference Erb, Germine and Hartshorne2023; Hampshire, Highfield, Parkin, & Owen, Reference Hampshire, Highfield, Parkin and Owen2012). Even without systematic exploration, these studies have produced major theoretical discoveries. They have also been instrumental in identifying important Almaatouq et al.-style parameters for subsequent systematic exploration. Critically, as Almaatouq et al. explain, the technology exists to conduct megastudies for most cognitive and behavioral questions, typically at lower aggregate cost than the status quo (see also Gosling & Mason, Reference Gosling and Mason2015; Li, Germine, Mehr, Srinivasan, & Hartshorne, Reference Li, Germine, Mehr, Srinivasan and Hartshorne2022; Long, Simson, Buxó-Lugo, Watson, & Mehr, Reference Long, Simson, Buxó-Lugo, Watson and Mehr2023). In short, our critique is of the “yes, and” variety. Yes, conduct systematic integrative metastudies when you can. And, when you cannot, conduct less systematic megastudies.
Almaatouq et al. provide an incisive and welcome critique of the dominant one-at-a-time paradigm. They argue for integrative studies that systematically and comprehensively explore the “universe of possible experiments” (target article, sect. 2.2, para. 1) in each domain of inquiry. While we are sympathetic to the goal, Almaatouq et al. overemphasize systematic at the expense of comprehensive.
As we see it, the core problem with the one-at-a-time approach is that it is too slow. It is not news that most studies extrapolate broadly from a miniscule sample of stimuli, subject demographics, and experimental paradigms, resulting in a long-running generalizability crisis (Clark, Reference Clark1973; Henrich, Heine, & Norenzayan, Reference Henrich, Heine and Norenzayan2010; Judd, Westfall, & Kenny, Reference Judd, Westfall and Kenny2012; Yarkoni, Reference Yarkoni2022). Even large literatures often fail to do much more than scratch the surface of possibilities (e.g., Hartshorne & Snedeker, Reference Hartshorne and Snedeker2013; Peterson, Bourgin, Agrawal, Reichman, & Griffiths, Reference Peterson, Bourgin, Agrawal, Reichman and Griffiths2021). It is as if we set out to explore the universe of possible experiments, but spent most of our time hanging out at the hotel pool.
In principle, Almaatouq et al.'s integrative experiment approach is ideal: Find the set of parameters that describe the universe of possible experiments and then survey systematically. Unfortunately, this requires better understanding of the phenomenon than we usually have. Indeed, the cognitive and behavioral sciences remain largely in Kuhn's preparadigmatic phase (Kuhn, Reference Kuhn2012), characterized by conflicting and incommensurate theories, each with its own set of assumptions, methods, and observations.
Let us illustrate the difficulty with an easy-to-articulate question: Are children better at learning all aspects of syntax or just certain parts? Answering this question requires comparing how quickly older and younger learners learn each component of syntax. The problem is that different theories propose radically different visions of what syntax is and how it might be subdivided. Theories differ in terms of whether syntax is governed by large numbers of highly articulated, abstract rules that combine structurally simple words; by a small number of simple rules that combine internally complex words; or by superimposed, prototype-like patterns, with no distinction between words and rules; among other possibilities (Chomsky, Reference Chomsky2014; Goldberg, Reference Goldberg1995, Reference Goldberg2009; Hopcroft, Motwani, & Ullman, Reference Hopcroft, Motwani and Ullman2001; Steedman, Reference Steedman2001). Even where theorists agree on the structure, they disagree on processing, with the same grammatical patterns subserved by different cognitive/neurological systems at different times or by different people (O'Donnell, Reference O'Donnell2015; Ullman, Reference Ullman2001).
To make matters worse, none of these are complete theories that can be applied to arbitrary stimuli. For starters, predictions depend on ancillary assumptions that are left as open empirical questions (such as the relative preference for generalizations vs. one-off computations). More problematically, different theories often prioritize explaining distinct phenomena (common or rare utterances; highly productive patterns or semi-idiomatic expressions; early child language or mature usage; similarities across languages or cross-linguistic differences); one theory may not make clear predictions about the core motivating phenomena for another, and vice versa. While we believe this reflects the difficulty of the problem, not the diligence of the researchers, the outcome is the same: There is a lot of theoretical progress lying between here and the integrative experiments proposed by Almaatouq et al.
Even our example above underestimates the problem. Almaatouq et al. focus primarily on sampling from a stimulus space, not a task space. Two of their examples (trolley problems and risky-choice scenarios) are narrowly defined paradigms for studying much broader phenomena (moral reasoning and decision making under uncertainty). Their third example (masked cueing) does involve manipulating some task parameters beyond the stimuli themselves, but remains tied to a narrowly circumscribed task.
This might be fine if we fully understood the relationship between tasks and the underlying cognitive processes, but mostly we do not. Consider, for instance, measures of cognitive control – itself one of the most thoroughly investigated constructs in cognitive psychology. There are a number of popular tasks used to study cognitive control, including the masked cuing paradigm described by Almaatouq et al. Recently, one of us directly compared cognitive control as measured by three closely related tasks: The Simon, Stroop, and flanker tasks (Erb et al., Reference Erb, Germine and Hartshorne2023). Two massive online experiments with more than 20,000 participants revealed that these three tasks show strikingly different patterns of change in performance over the lifespan and near-zero correlations. Thus, integrative studies of cognitive control need to sample not just across stimuli but also paradigms. However, it is not clear that the differences across paradigms/tasks can be easily parameterized. Indeed, advances in our fields often owe themselves to the creation of new paradigms that open up new questions or comparisons.
Perhaps we are too pessimistic. Perhaps most questions resemble trolley problems and few resemble syntax or cognitive control (though we note that one of Almaatouq et al.'s three examples actually investigated cognitive control). But we would hate to predicate moving beyond the one-at-a-time approach on the widespread feasibility of parameterization. We worry that this licenses researchers (and editors and funders) to let perfect be the enemy of better – because we can do much better.
In particular, Almaatouq et al. may have only been able to find three examples of systematic exploration of the universe of possible experiments, but comprehensive explorations abound. This includes megastudies that test large, diverse sets of stimuli (e.g., Breithaupt, Li, & Kruschke, Reference Breithaupt, Li and Kruschke2022; Brysbaert, Stevens, Mandera, & Keuleers, Reference Brysbaert, Stevens, Mandera and Keuleers2016; De Deyne, Navarro, Perfors, Brysbaert, & Storms, Reference De Deyne, Navarro, Perfors, Brysbaert and Storms2019; Hartshorne, Bonial, & Palmer, Reference Hartshorne, Bonial and Palmer2014), broad subject demographics (e.g., Bleidorn et al., Reference Bleidorn, Klimstra, Denissen, Rentfrow, Potter and Gosling2013; Hartshorne, Tenenbaum, & Pinker, Reference Hartshorne, Tenenbaum and Pinker2018; Nosek, Banaji, & Greenwald, Reference Nosek, Banaji and Greenwald2002; Riley et al., Reference Riley, Okabe, Germine, Wilmer, Esterman and DeGutis2016; Soto, John, Gosling, & Potter, Reference Soto, John, Gosling and Potter2011; Spiers, Coutrot, & Hornberger, Reference Spiers, Coutrot and Hornberger2023), or a range of related tasks (e.g., Erb, Germine, & Hartshorne, Reference Erb, Germine and Hartshorne2023; Hampshire, Highfield, Parkin, & Owen, Reference Hampshire, Highfield, Parkin and Owen2012). Even without systematic exploration, these studies have produced major theoretical discoveries. They have also been instrumental in identifying important Almaatouq et al.-style parameters for subsequent systematic exploration. Critically, as Almaatouq et al. explain, the technology exists to conduct megastudies for most cognitive and behavioral questions, typically at lower aggregate cost than the status quo (see also Gosling & Mason, Reference Gosling and Mason2015; Li, Germine, Mehr, Srinivasan, & Hartshorne, Reference Li, Germine, Mehr, Srinivasan and Hartshorne2022; Long, Simson, Buxó-Lugo, Watson, & Mehr, Reference Long, Simson, Buxó-Lugo, Watson and Mehr2023). In short, our critique is of the “yes, and” variety. Yes, conduct systematic integrative metastudies when you can. And, when you cannot, conduct less systematic megastudies.
Financial support
The authors acknowledge funding from NSF 2229631 to J. K. H.
Competing interest
None.