Getting lost in an infinite design space is no solution

Mario Gollwitzer; Johannes Prager

doi:10.1017/S0140525X23002236

Getting lost in an infinite design space is no solution

Published online by Cambridge University Press: 05 February 2024

Mario Gollwitzer

and

Johannes Prager

Show author details

Mario Gollwitzer*: Affiliation:
Department of Psychology, Ludwig-Maximilians-Universität München, München, Germany mario.gollwitzer@psy.lmu.de jo.prager@psy.lmu.de https://www.psy.lmu.de/soz_en/team/professors/mario-gollwitzer/index.html https://www.psy.lmu.de/soz_en/team/academic-staff/prager/index.html
Johannes Prager: Affiliation:
Department of Psychology, Ludwig-Maximilians-Universität München, München, Germany mario.gollwitzer@psy.lmu.de jo.prager@psy.lmu.de https://www.psy.lmu.de/soz_en/team/professors/mario-gollwitzer/index.html https://www.psy.lmu.de/soz_en/team/academic-staff/prager/index.html
*: *Corresponding author.

Article contents

Abstract
Financial support
Competing interest
References

Rights & Permissions

Abstract

Almaatouq et al. argue that an “integrative experiment design” approach can help generating cumulative empirical and theoretical knowledge. Here, we discuss the novelty of their approach and scrutinize its promises and pitfalls. We argue that setting up a “design space” may turn out to be theoretically uninformative, inefficient, and even impossible. Designing truly diagnostic experiments provides a better alternative.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e44

DOI: https://doi.org/10.1017/S0140525X23002236 [Opens in a new window]
Copyright: Copyright © The Author(s), 2024. Published by Cambridge University Press

Almaatouq et al. argue that research findings in the behavioral and social sciences are rarely conclusive and even less integrative or cumulative. They claim that these insufficiencies largely originate from a flawed approach to research design and the “one-at-a-time” procedure of planning and conducting experiments. Almaatouq et al. propose an alternative: “Integrative experimentation.”

While we share many of the concerns raised in the target article, we have reservations about the proposed alternative. First, the “integrative experimentation” idea is – in principal – not novel. Similar demands for integrative, diagnostic, and cumulative experimentation have been voiced repeatedly in the past (e.g., Brunswik, Reference Brunswik1955; Campbell, Reference Campbell1957; Meehl, Reference Meehl1978; Platt, Reference Platt1964). That said, truly cumulative research endeavors and rigid theory testing has not really been a strength of psychological science so far. Walter Mischel once called this the “toothbrush problem” of psychological science: “Psychologists treat other peoples’ theories like toothbrushes – no self-respecting person wants to use anyone else's” (Mischel, Reference Mischel2008). Not much has changed since then – at least not until the replication crisis hit psychology full force.

Second, and more importantly, constructing the “design space,” the first step in Almaatouq et al.'s integrative experimentation approach, is practically very difficult, if not impossible. Where should the dimensions that constitute a design space come from? One problem is that these dimensions are either theoretical constructs themselves or at least defined and informed by theoretical assumptions, and that their number is potentially infinite. Consider the group synergy/group performance example discussed in the target article: Design space dimensions such as “social perceptiveness” or “cognitive-style diversity” are theoretical constructs. The theories that underlie these concepts provide definitions, locate them in a nomological network, and allow the construction of measurement tools. The latter is particularly important because a measurement theory is necessary to test the construct validity of operationalizations (measures and/or manipulations) for a given construct (e.g., Fiedler, McCaughey, & Prager, Reference Fiedler, McCaughey and Prager2021). So, each design-space dimension turns out to have its own design space. Constructing design spaces may therefore lead to infinite regresses.

Third, and related to our previous point, setting up a design space is necessarily contingent on both the focal hypotheses as well as all accompanying auxiliary assumptions, some of which are relevant for testing (and building) a theory, while others are not. For instance, operationalizing “cognitive-style diversity” either as the sum of the within-team standard deviation across cognitive styles (Aggarwal & Woolley, Reference Aggarwal and Woolley2013) or as the average team members' intrapersonal cognitive-style diversity score (Bunderson & Sutcliffe, Reference Bunderson and Sutcliffe2002) may be relevant for testing a measurement theory, yet irrelevant for testing a substantive theory. Almaatouq et al.'s claim that “All sources of measurable experimental-design variation are potentially relevant, and decisions about which parameters are relatively more or less important are to be answered empirically” (target article, sect. 3, para. 1) disregards the (in our view, important) distinction between conceptually relevant and conceptually irrelevant design space dimensions (see Gollwitzer & Schwabe, Reference Gollwitzer and Schwabe2022).

Fourth, all design-space decisions necessarily require a universal metatheory. Such a metatheory does not exist, and it is unlikely to appear in the future. To be clear: We do not oppose to the idea of integrative experimentation. Indeed, we second Almaatouq et al.'s (and Walter Mischel's) claim for more theory building and integration (“cumulative science”). But without a metatheory that can help us define the design space, a theoretically guided and targeted “one-at-a-time approach” may eventually yield more solid cumulative evidence than a theory-free “integrative experimentation” approach.

Fifth, and finally, since resources and the number of participants required for conducting experiments are limited, randomly combining potentially infinite design factors is not efficient. Atheoretical sampling from the design space requires a tremendous amount of resources, which are better spent testing theories in a most rigid and truly falsification-oriented fashion.

Instead of randomly sampling from an infinite design space, a more useful rationale for setting up an integrative experimental series requires acknowledging a hierarchy regarding the informativeness and discriminability of potential design factor decisions (Fiedler, Reference Fiedler2017). Design factors that can – either logically or theoretically – be defined as more crucial than others (e.g., because they relate to a core assumption of a theory or provide a strong test of the boundary conditions of a theory) should be prioritized over those that are more marginal. For instance, if a theory predicted that interacting groups outperform nominal groups more strongly in conjunctive than in additive tasks, an experimental series might focus on varying conjunctive and additive tasks (or varying different forms of group interaction), while holding other features, which are less relevant to a rigid test of this hypothesis, constant (e.g., culture) or treating these features as random factors (e.g., sample characteristics).

Indeed, modern learning methodology can support the identification of the most informative and diagnostic design factors. However, such models do not learn from random and independent data, but from results depending on design factors that were a priori defined, scaled, and considered relevant (or irrelevant) as well as design-space boundaries set by the experimenters. To conclude, conducting conclusive research means (1) improving the precision of formulated hypotheses, (2) specifying metahypotheses from which (at least some) auxiliary assumptions and boundary conditions can be deducted, and (3) pursuing a systematic, exclusive, and diagnostic testing strategy. From our perspective, systematic research design does not mean searching through a maximally affordable design space, but carefully designing experiments that exclude explanations, alternative phenomena, or assumptions. This goal can rather be achieved by systematic exclusion than exploration of possibilities (Wason, Reference Wason1960). Scaling up such diagnostic experimental series would not require changing the methodological paradigm: On the contrary, truly integrative, metatheoretical, conclusive, and cumulative research is nothing less than proper execution of a long-known and widely accepted experimental methodology.

Financial support

This work was supported by a grant provided by the Deutsche Forschungsgemeinschaft (SPP 2317 – project No. 467852570) to the first author.

Competing interest

None.

References

Aggarwal, I., & Woolley, A. W. (2013). Do you see what I see? The effect of members’ cognitive styles on team processes and errors in task execution. Organizational Behavior and Human Decision Processes, 122(1), 92–99. https://doi.org/10.1016/j.obhdp.2013.04.003CrossRef Google Scholar

Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62(3), 193–217. https://doi.org/10.1037/h0047470CrossRef Google Scholar

Bunderson, J. S., & Sutcliffe, K. M. (2002). Comparing alternative conceptualizations of functional diversity in management teams: Process and performance effects. Academy of Management Journal, 45(5), 875–893. https://doi.org/10.2307/3069319CrossRef Google Scholar

Campbell, D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54(4), 297–312. https://doi.org/10.1037/h0040950CrossRef Google Scholar

Fiedler, K. (2017). What constitutes strong psychological science? The (neglected) role of diagnosticity and a priori theorizing. Perspectives on Psychological Science, 12(1), 46–61. https://doi.org/10.1177/1745691616654458CrossRef Google Scholar

Fiedler, K., McCaughey, L., & Prager, J. (2021). Quo vadis, methodology? The key role of manipulation checks for validity control and quality of science. Perspectives on Psychological Science, 16(4), 816–826. https://doi.org/10.1177/1745691620970602CrossRef Google Scholar

Gollwitzer, M., & Schwabe, J. (2022). Context dependency as a predictor of replicability. Review of General Psychology, 26(2), 241–249. https://doi.org/10.1177/10892680211015635CrossRef Google Scholar

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46(4), 806–834. https://doi.org/10.1037/0022-006X.46.4.806CrossRef Google Scholar

Mischel, W. (2008). The toothbrush problem. APS Observer [Online Resource]. https://www.psychologicalscience.org/observer/the-toothbrush-problem Google Scholar

Platt, J. R. (1964). Strong inference: Certain systematic methods of scientific thinking may produce much more rapid progress than others. Science (New York, N.Y.), 146(3642), 347–353. https://doi.org/10.1126/science.146.3642.347CrossRef Google Scholar PubMed

Wason, P. C. (1960). On the failure to eliminate hypotheses in a conceptual task. The Quarterly Journal of Experimental Psychology, 12, 129–140. https://doi.org/10.1080/17470216008416717CrossRef Google Scholar