Hostname: page-component-7bb8b95d7b-pwrkn Total loading time: 0 Render date: 2024-09-27T22:32:01.110Z Has data issue: false hasContentIssue false

Measurement validity and the integrative approach

Published online by Cambridge University Press:  05 February 2024

Wendy C. Higgins*
Affiliation:
School of Psychological Sciences, Macquarie University, Sydney, NSW, Australia wendy.higgins@mq.edu.au, https://researchers.mq.edu.au/en/persons/wendy-higgins eliane.deschrijver@mq.edu.au, https://researchers.mq.edu.au/en/persons/eliane-deschrijver
Alexander J. Gillett
Affiliation:
Department of Philosophy, Macquarie University, Sydney, NSW, Australia alexander.gillett@mq.edu.au robross46@gmail.com, https://researchers.mq.edu.au/en/persons/robert-ross
Eliane Deschrijver
Affiliation:
School of Psychological Sciences, Macquarie University, Sydney, NSW, Australia wendy.higgins@mq.edu.au, https://researchers.mq.edu.au/en/persons/wendy-higgins eliane.deschrijver@mq.edu.au, https://researchers.mq.edu.au/en/persons/eliane-deschrijver
Robert M. Ross
Affiliation:
Department of Philosophy, Macquarie University, Sydney, NSW, Australia alexander.gillett@mq.edu.au robross46@gmail.com, https://researchers.mq.edu.au/en/persons/robert-ross
*
*Corresponding author.

Abstract

Almaatouq et al. propose a novel integrative approach to experiments. We provide three examples of how unaddressed measurement issues threaten the feasibility of the approach and its promise of promoting commensurability and knowledge integration.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2024. Published by Cambridge University Press

When scientists lack validity evidence for measures, they lack the necessary information to evaluate the overall validity of a study's conclusions.

Flake and Fried (Reference Flake and Fried2020, p. 457)

Questionable measurement practices are widespread in the social and behavioural sciences and raise serious questions about the interpretability of numerous studies (Flake & Fried, Reference Flake and Fried2020; Lilienfeld & Strother, Reference Lilienfeld and Strother2020; Vazire, Schiavone, & Bottesini, Reference Vazire, Schiavone and Bottesini2022). Because Almaatouq et al. do not explicitly address measurement, we argue that unresolved measurement issues may threaten the feasibility and utility of their integrative approach. Below, we present three measurement concerns.

First, the interpretability of findings from experiments designed using the integrative approach will rely on the use of valid measurements. Consider the “Moral Machine” experiment (Awad et al., Reference Awad, Dsouza, Kim, Schulz, Henrich, Shariff and Rahwan2018, Reference Awad, Dsouza, Kim, Schulz, Henrich, Shariff and Rahwan2020), which Almaatouq et al. describe as “seminal.” Utilising a modified version of the trolley problem, this experiment evaluated participant's preferences for how autonomous vehicles should weight lives in life-or-death situations based on nine different dimensions. By assessing these dimensions simultaneously and collecting responses from millions of participants, Almaatouq et al. claim that this experiment “offers numerous findings that were neither obvious nor deducible from prior research or traditional experimental designs” (target article, sect. 4.1, para. 2). One of these key findings is that participants are willing to treat people differently based on demographic characteristics when the complexity of a moral decision is increased. However, the validity of this finding has been questioned because it may be an artefact of the forced choice methodology that was used (Bigman & Gray, Reference Bigman and Gray2020). In addition, there is considerable debate in moral psychology about the external validity of the trolley problem and other sacrificial dilemmas (i.e., it is unclear that responses in these tasks predict real-world decisions or ethical judgements; Bauman, McGraw, Bartels, & Warren, Reference Bauman, McGraw, Bartels and Warren2014; Bostyn, Sevenhant, & Roets, Reference Bostyn, Sevenhant and Roets2018). Thus, to our minds, this example demonstrates that no matter how large and integrative an experiment might be, evaluating the validity of the measurements is essential.

Second, the construction of design spaces and the mapping of experiments onto them relies on valid measurement of design space dimensions. However, the validity of measurements, including those obtained from widely used measures, cannot be assumed. Consider Almaatouq et al.'s identification of social perceptiveness as a relevant dimension of group synergy research. They cite four studies that measured social perceptiveness using the Reading the Mind in the Eyes Test (RMET; Almaatouq, Alsobay, Yin, & Watts, Reference Almaatouq, Alsobay, Yin and Watts2021; Engel, Woolley, Jing, Chabris, & Malone, Reference Engel, Woolley, Jing, Chabris and Malone2014; Kim et al., Reference Kim, Engel, Woolley, Lin, McArthur and Malone2017; Woolley, Chabris, Pentland, Hashmi, & Malone, Reference Woolley, Chabris, Pentland, Hashmi and Malone2010). However, it is unclear what psychological constructs the RMET measures. While the RMET has been used to measure multiple dimensions of social cognition, including “theory of mind,” “emotion recognition,” “empathy,” “emotional intelligence,” “mindreading,” “mentalising,” and “social perceptiveness,” there is ongoing debate about the relationship between these constructs and which, if any, of them the RMET actually measures (Kittel, Olderbak, & Wilhelm, Reference Kittel, Olderbak and Wilhelm2022; Oakley, Brewer, Bird, & Catmur, Reference Oakley, Brewer, Bird and Catmur2016; Silverman, Reference Silverman2022). Moreover, despite the extensive use of the RMET (cited over 7,000 times according to Google Scholar), serious questions have been raised about the reliability and validity of RMET scores (Higgins, Ross, Langdon, & Polito, Reference Higgins, Ross, Langdon and Polito2023; Higgins, Ross, Polito, & Kaplan, Reference Higgins, Ross, Polito and Kaplan2023; Kittel et al., Reference Kittel, Olderbak and Wilhelm2022; Olderbak et al., Reference Olderbak, Wilhelm, Olaru, Geiger, Brenneman and Roberts2015). This means that any integrative experiment that uses the RMET to measure social perceptiveness as a dimension of group synergy research will be very difficult to interpret. Given that vast swathes of measures used in psychological and social science research lack good validity evidence (Flake & Fried, Reference Flake and Fried2020), analogous validity concerns are likely to exist for measures of many dimensions of a given design space. Thus, measurement validation is a critical and nontrivial consideration for the construction and implementation of the design spaces at the heart of the integrative approach. Moreover, given that design spaces are likely to include large numbers of dimensions, a coherent strategy to handle these issues must be developed otherwise the integrative approach risks becoming unmanageable in terms of magnitude and complexity.

Third, measurement incommensurability poses a substantial challenge to the feasibility and utility of the integrative approach because knowledge integration relies on valid and commensurable measurements. Consider depression, one of the most prevalent mental health conditions worldwide (Herrman et al., Reference Herrman, Kieling, McGorry, Horton, Sargent and Patel2019). Fried, Flake, and Robinaugh (Reference Fried, Flake and Robinaugh2022) recently identified over 280 different depression measures. Extensive variability in the symptoms assessed by these measures forced them to conclude that different depression measures “seem to measure different ‘depressions’” (p. 360). Moreover, they found that depression measures frequently fail to show measurement invariance, meaning that they might measure different things when used in different groups or contexts. Fried and colleagues’ examination of depression measures is an unusually thorough demonstration of just how serious measurement incommensurability problems can be. Nonetheless, there are indications that validity and commensurability problems extend to a diverse range of research areas which, troublingly, are also pertinent to human welfare, including child and adolescent psychopathology (Stevanovic et al., Reference Stevanovic, Jafari, Knez, Franic, Atilola, Davidovic and Lakic2017); race-related attitudes, beliefs, and motivations (Hester, Axt, Siemers, & Hehman, Reference Hester, Axt, Siemers and Hehman2023); and well-being (Alexandrova & Haybron, Reference Alexandrova and Haybron2016). While Almaatouq et al. claim that their integrative approach “intrinsically promotes commensurability and continuous integration of knowledge” (target article, abstract), it is unclear how the approach can feasibly address incommensurability arising from the use of disparate measures and violations of measurement invariance. Left unaddressed, measurement incommensurability might substantially curtail the knowledge integration potential of the proposed approach.

To summarise, although we are sympathetic to Almaatouq et al.'s ambitious attempt to tackle the substantial challenges in the psychological and behavioural sciences, their lack of engagement with the measurement literature raises serious questions about their approach. If it is to deliver its intended benefits of increased commensurability and knowledge integration, then measurement must be addressed explicitly. It is unclear to us whether this can be achieved while maintaining the feasibility of the proposed integrative approach.

Financial support

This work was supported by an Australian Government Research Training Program (RTP) Scholarship (W. C. H.), a Macquarie University Research Excellence Scholarship (W. C. H.), a Discovery Early Researcher Award (DECRA) by The Australian Research Council (ARC) (E. D., grant number DE220100087), and the John Templeton Foundation (R. M. R., grant number 62631; A.G., grant number 61924).

Competing interest

None.

References

Alexandrova, A., & Haybron, D. M. (2016). Is construct validation valid? Philosophy of Science, 83(5), 10981109. https://doi.org/10.1086/687941CrossRefGoogle Scholar
Almaatouq, A., Alsobay, M., Yin, M., & Watts, D. J. (2021). Task complexity moderates group synergy. Proceedings of the National Academy of Sciences of the United States of America, 118(36), Article e2101062118. https://doi.org/10.1073/pnas.2101062118CrossRefGoogle ScholarPubMed
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2018). The moral machine experiment. Nature, 563(7729), 5964. https://doi.org/10.1038/s41586-018-0637-6CrossRefGoogle ScholarPubMed
Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2020). Reply to: Life and death decisions of autonomous vehicles. Nature, 579(7797), E3E5. https://doi.org/10.1038/s41586-020-1988-3CrossRefGoogle ScholarPubMed
Bauman, C. W., McGraw, A. P., Bartels, D. M., & Warren, C. (2014). Revisiting external validity: Concerns about trolley problems and other sacrificial dilemmas in moral psychology. Social and Personality Psychology Compass, 8(9), 536554. https://doi.org/10.1111/spc3.12131CrossRefGoogle Scholar
Bigman, Y. E., & Gray, K. (2020). Life and death decisions of autonomous vehicles. Nature, 579(7797), E1E2. https://doi.org/10.1038/s41586-020-1987-4CrossRefGoogle ScholarPubMed
Bostyn, D. H., Sevenhant, S., & Roets, A. (2018). Of mice, men, and trolleys: Hypothetical judgment versus real-life behavior in trolley-style moral dilemmas. Psychological Science, 29(7), 10841093. https://doi.org/10.1177/0956797617752640CrossRefGoogle ScholarPubMed
Engel, D., Woolley, A. W., Jing, L. X., Chabris, C. F., & Malone, T. W. (2014). Reading the mind in the eyes or reading between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face. PLoS ONE, 9(12), e115212. https://doi.org/10.1371/journal.pone.0115212CrossRefGoogle ScholarPubMed
Flake, J. K., & Fried, E. I. (2020). Measurement schmeasurement: Questionable measurement practices and how to avoid them. Advances in Methods and Practices in Psychological Science, 3(4), 456465. https://doi.org/10.1177/2515245920952393CrossRefGoogle Scholar
Fried, E. I., Flake, J. K., & Robinaugh, D. J. (2022). Revisiting the theoretical and methodological foundations of depression measurement. Nature Reviews Psychology, 1(6), 358368. https://doi.org/10.1038/s44159-022-00050-2CrossRefGoogle ScholarPubMed
Herrman, H., Kieling, C., McGorry, P., Horton, R., Sargent, J., & Patel, V. (2019). Reducing the global burden of depression: A Lancet–World Psychiatric Association Commission. The Lancet, 393(10189), e42e43. https://doi.org/10.1016/S0140-6736(18)32408-5CrossRefGoogle ScholarPubMed
Hester, N., Axt, J. R., Siemers, N., & Hehman, E. (2023). Evaluating validity properties of 25 race-related scales. Behavior Research Methods, 55(4), 17581777. https://doi.org/10.3758/s13428-022-01873-wCrossRefGoogle ScholarPubMed
Higgins, W. C., Ross, R. M., Langdon, R., & Polito, V. (2023). The “reading the mind in the eyes” test shows poor psychometric properties in a large, demographically representative U.S. sample. Assessment, 30(6), 17771789. https://doi.org/10.1177/10731911221124342CrossRefGoogle Scholar
Higgins, W. C., Ross, R. M., Polito, V., & Kaplan, D. M. (2023). Three threats to the validity of the reading the mind in the eyes test: A commentary on Pavolova and Sokolov (2022). Neuroscience and Biobehavioral Reviews, 147, 105088. https://doi.org/10.1016/j.neubiorev.2023.105088CrossRefGoogle Scholar
Kim, Y. J., Engel, D., Woolley, A. W., Lin, J. Y.-T., McArthur, N., & Malone, T. W. (2017). What makes a strong team? Using collective intelligence to predict team performance in League of Legends. In Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing (pp. 2316–2329). Association for Computing Machinery, Portland, Oregon, USA. https://doi.org/10.1145/2998181.2998185CrossRefGoogle Scholar
Kittel, A. F. D., Olderbak, S., & Wilhelm, O. (2022). Sty in the mind's eye: A meta-analytic investigation of the nomological network and internal consistency of the “reading the mind in the eyes” test. Assessment, 29(5), 872895. https://doi.org/10.1177/1073191121996469CrossRefGoogle Scholar
Lilienfeld, S. O., & Strother, A. N. (2020). Psychological measurement and the replication crisis: Four sacred cows. Canadian Psychology, 61(4), 281288. https://doi.org/10.1037/cap0000236CrossRefGoogle Scholar
Oakley, B. F., Brewer, R., Bird, G., & Catmur, C. (2016). Theory of mind is not theory of emotion: A cautionary note on the reading the mind in the eyes test. Journal of Abnormal Psychology, 125(6), 818823. https://doi.org/10.1037/abn0000182CrossRefGoogle Scholar
Olderbak, S., Wilhelm, O., Olaru, G., Geiger, M., Brenneman, M. W., & Roberts, R. D. (2015). A psychometric analysis of the reading the mind in the eyes test: Toward a brief form for research and applied settings. Frontiers in Psychology, 6, 1503. https://doi.org/10.3389/fpsyg.2015.01503CrossRefGoogle Scholar
Silverman, C. (2022). How to read “reading the mind in the eyes”. Notes and Records of the Royal Society of London, 76(4), 683697. https://doi.org/10.1098/rsnr.2021.0058CrossRefGoogle Scholar
Stevanovic, D., Jafari, P., Knez, R., Franic, T., Atilola, O., Davidovic, N., … Lakic, A. (2017). Can we really use available scales for child and adolescent psychopathology across cultures? A systematic review of cross-cultural measurement invariance data. Transcultural Psychiatry, 54(1), 125152. https://doi.org/10.1177/1363461516689215CrossRefGoogle ScholarPubMed
Vazire, S., Schiavone, S. R., & Bottesini, J. G. (2022). Credibility beyond replicability: Improving the four validities in psychological science. Current Directions in Psychological Science, 31(2), 162168. https://doi.org/10.1177/09637214211067779CrossRefGoogle Scholar
Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science (New York, N.Y.), 330(6004), 686688. https://doi.org/10.1126/science.1193147CrossRefGoogle ScholarPubMed