Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences

Abdullah Almaatouq; Thomas L. Griffiths; Jordan W. Suchow; Mark E. Whiting; James Evans; Duncan J. Watts

doi:10.1017/S0140525X22002874

Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences

Published online by Cambridge University Press: 21 December 2022

Abdullah Almaatouq

Thomas L. Griffiths ,

James Evans and

Abdullah Almaatouq*: Affiliation:
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, MA, USA amaatouq@mit.edu
Thomas L. Griffiths: Affiliation:
Departments of Psychology and Computer Science, Princeton University, Princeton, NJ, USA tomg@princeton.edu
Jordan W. Suchow: Affiliation:
School of Business, Stevens Institute of Technology, Hoboken, NJ, USA jws@stevens.edu
Mark E. Whiting: Affiliation:
School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA markew@seas.upenn.edu
James Evans: Affiliation:
Department of Sociology, University of Chicago, Chicago, IL, USA jevans@uchicago.edu Santa Fe Institute, Santa Fe, NM, USA
Duncan J. Watts: Affiliation:
Department of Computer and Information Science, Annenberg School of Communication, and Operations, Information, and Decisions Department, University of Pennsylvania, Philadelphia, PA, USA djwatts@seas.upenn.edu
*: Corresponding author: Abdullah Almaatouq; Email: amaatouq@mit.edu

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

The dominant paradigm of experiments in the social and behavioral sciences views an experiment as a test of a theory, where the theory is assumed to generalize beyond the experiment's specific conditions. According to this view, which Alan Newell once characterized as “playing twenty questions with nature,” theory is advanced one experiment at a time, and the integration of disparate findings is assumed to happen via the scientific publishing process. In this article, we argue that the process of integration is at best inefficient, and at worst it does not, in fact, occur. We further show that the challenge of integration cannot be adequately addressed by recently proposed reforms that focus on the reliability and replicability of individual findings, nor simply by conducting more or larger experiments. Rather, the problem arises from the imprecise nature of social and behavioral theories and, consequently, a lack of commensurability across experiments conducted under different conditions. Therefore, researchers must fundamentally rethink how they design experiments and how the experiments relate to theory. We specifically describe an alternative framework, integrative experiment design, which intrinsically promotes commensurability and continuous integration of knowledge. In this paradigm, researchers explicitly map the design space of possible experiments associated with a given research question, embracing many potentially relevant theories rather than focusing on just one. Researchers then iteratively generate theories and test them with experiments explicitly sampled from the design space, allowing results to be integrated across experiments. Given recent methodological and technological developments, we conclude that this approach is feasible and would generate more-reliable, more-cumulative empirical and theoretical knowledge than the current paradigm – and with far greater efficiency.

Keywords

cumulative knowledge experiments generalizability (in)commensurability

Type: Target Article
Information: Behavioral and Brain Sciences , Volume 47 , 2024 , e33

DOI: https://doi.org/10.1017/S0140525X22002874 [Opens in a new window]
Copyright: Copyright © The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aad, G., Abajyan, T., Abbott, B., Abdallah, J., Abdel Khalek, S., Abdelalim, A. A., … Zwalinski, L. (2012). Observation of a new particle in the search for the Standard Model Higgs Boson with the ATLAS detector at the LHC. Physics Letters, Part B, 716(1), 1–29.CrossRef Google Scholar

Abbott, B. P., Abbott, R., Abbott, T. D., Abernathy, M. R., Acernese, F., Ackley, K., … LIGO Scientific Collaboration and Virgo Collaboration. (2016). Observation of gravitational waves from a binary black hole merger. Physical Review Letters, 116(6), 061102.CrossRef Google Scholar PubMed

Aggarwal, I., & Woolley, A. W. (2018). Team creativity, cognition, and cognitive style diversity. Management Science, 65(4), 1586–1599. https://doi.org/10.1287/mnsc.2017.3001CrossRef Google Scholar

Agrawal, M., Peterson, J. C., & Griffiths, T. L. (2020). Scaling up psychology via scientific regret minimization. Proceedings of the National Academy of Sciences of the United States of America, 117(16), 8825–8835.CrossRef Google Scholar PubMed

Allen, L., Scott, J., Brand, A., Hlava, M., & Altman, M. (2014). Publishing: Credit where credit is due. Nature, 508(7496), 312–313.CrossRef Google Scholar PubMed

Allen, N. J., & Hecht, T. D. (2004). The “romance of teams”: Toward an understanding of its psychological underpinnings and implications. Journal of Occupational and Organizational Psychology, 77(4), 439–461.CrossRef Google Scholar

Allport, F. H. (1924). The group fallacy in relation to social science. The American Journal of Sociology, 29(6), 688–706.CrossRef Google Scholar

Almaatouq, A. (2019). Towards stable principles of collective intelligence under an environment-dependent framework. Massachusetts Institute of Technology. https://dspace.mit.edu/handle/1721.1/123223?show=full?show=full Google Scholar

Almaatouq, A., Alsobay, M., Yin, M., & Watts, D. J. (2021a). Task complexity moderates group synergy. Proceedings of the National Academy of Sciences of the United States of America, 118(36), e2101062118. https://doi.org/10.1073/pnas.2101062118CrossRef Google Scholar PubMed

Almaatouq, A., Becker, J., Houghton, J. P., Paton, N., Watts, D. J., & Whiting, M. E. (2021b). Empirica: A virtual lab for high-throughput macro-level experiments. Behavior Research Methods, 53, 2158–2171. https://doi.org/10.3758/s13428-020-01535-9CrossRef Google Scholar PubMed

Almaatouq, A., Noriega-Campero, A., Alotaibi, A., Krafft, P. M., Moussaid, M., & Pentland, A. (2020). Adaptive social networks promote the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America, 117(21), 11379–11386.CrossRef Google Scholar PubMed

Almaatouq, A., Rahimian, M. A., Burton, J. W., & Alhajri, A. (2022). The distribution of initial estimates moderates the effect of social influence on the wisdom of the crowd. Scientific Reports, 12(1), 16546.CrossRef Google Scholar PubMed

Many Primates, Altschul, D. M., Beran, M. J., Bohn, M., Call, J., DeTroy, S., Duguid, S. J., … Watzek, J. (2019). Establishing an infrastructure for collaboration in primate cognition research. PLoS ONE, 14(10), e0223675.Google Scholar PubMed

Arrow, H., McGrath, J. E., & Berdahl, J. L. (2000). Small groups as complex systems: Formation, coordination, development, and adaptation. Sage.CrossRef Google Scholar

Atkinson, A. C., & Donev, A. N. (1992). Optimum experimental designs (Oxford statistical science series, 8) (1st ed.). Clarendon Press.Google Scholar

Aumann, R. J., & Hart, S. (1992). Handbook of game theory with economic applications. Elsevier.Google Scholar

Auspurg, K., & Hinz, T. (2014). Factorial survey experiments. Sage.Google Scholar

Awad, E., Dsouza, S., Bonnefon, J.-F., Shariff, A., & Rahwan, I. (2020). Crowdsourcing moral machines. Communications of the ACM, 63(3), 48–55.CrossRef Google Scholar

Awad, E., Dsouza, S., Kim, R., Schulz, J., Henrich, J., Shariff, A., … Rahwan, I. (2018). The Moral Machine experiment. Nature, 563(7729), 59–64.CrossRef Google Scholar PubMed

Bakshy, E., Dworkin, L., Karrer, B., Kashin, K., Letham, B., Murthy, A., & Singh, S. (2018). AE: A domain-agnostic platform for adaptive experimentation. Workshop on System for ML. http://learningsys.org/nips18/assets/papers/87CameraReadySubmissionAE%20-%20NeurIPS%202018.pdf Google Scholar

Balandat, M., Karrer, B., Jiang, D. R., Daulton, S., Letham, B., Wilson, A. G., & Bakshy, E. (2020). BoTorch: A framework for efficient Monte-Carlo Bayesian optimization. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS'20) (pp. 21524–21538). Curran Associates Inc.Google Scholar

Balietti, S. (2017). NodeGame: Real-time, synchronous, online experiments in the browser. Behavior Research Methods, 49(5), 1696–1715.CrossRef Google Scholar PubMed

Balietti, S., Klein, B., & Riedl, C. (2021). Optimal design of experiments to identify latent behavioral types. Experimental Economics, 24, 772–799. https://doi.org/10.1007/s10683-020-09680-wCrossRef Google Scholar

Baribault, B., Donkin, C., Little, D. R., Trueblood, J. S., Oravecz, Z., van Ravenzwaaij, D., … Vandekerckhove, J. (2018). Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences of the United States of America, 115(11), 2607–2612.CrossRef Google Scholar PubMed

Barron, B. (2003). When smart groups fail. Journal of the Learning Sciences, 12(3), 307–359.CrossRef Google Scholar

Becker, J., Brackbill, D., & Centola, D. (2017). Network dynamics of social influence in the wisdom of crowds. Proceedings of the National Academy of Sciences of the United States of America, 114(26), E5070–E5076.Google Scholar PubMed

Bell, S. T. (2007). Deep-level composition variables as predictors of team performance: A meta-analysis. The Journal of Applied Psychology, 92(3), 595–615.CrossRef Google Scholar PubMed

Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E.-J., Berk, R., … Camerer, C. (2018). Redefine statistical significance. Nature Human Behaviour, 2, 6–10. https://doi.org/10.1038/s41562-017-0189-zCrossRef Google Scholar PubMed

Berkman, E. T., & Wilson, S. M. (2021). So useful as a good theory? The practicality crisis in (social) psychological theory. Perspectives on Psychological Science, 16(4), 864–874. https://doi.org/10.1177/1745691620969650CrossRef Google Scholar PubMed

Bertrand, M., & Mullainathan, S. (2004). Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. The American Economic Review, 94(4), 991–1013.CrossRef Google Scholar

Bourgin, D. D., Peterson, J. C., Reichman, D., Russell, S. J., & Griffiths, T. L. (2019). Cognitive model priors for predicting human decisions. In Chaudhuri, K. & Salakhutdinov, R. (Eds.), Proceedings of the 36th international conference on machine learning (Vol. 97, pp. 5133–5141). PMLR.Google Scholar

Bowen, D. (n.d.). Hemlock. Retrieved April 22, 2022, from https://dsbowen.gitlab.io/hemlock Google Scholar

Brewin, C. R. (2022). Impact on the legal system of the generalizability crisis in psychology. The Behavioral and Brain Sciences, 45, e7.CrossRef Google Scholar PubMed

Breznau, N., Rinke, E. M., Wuttke, A., Nguyen, H. H. V., Adem, M., Adriaans, J., … Żółtak, T. (2022). Observing many researchers using the same data and hypothesis reveals a hidden universe of uncertainty. Proceedings of the National Academy of Sciences of the United States of America, 119(44), e2203150119.CrossRef Google Scholar PubMed

Brunswik, E.. (1947). Systematic and representative design of psychological experiments. In Proceedings of the Berkeley symposium on mathematical statistics and probability (pp. 143–202). University of California Press.Google Scholar

Brunswik, E. (1955). Representative design and probabilistic theory in a functional psychology. Psychological Review, 62(3), 193–217.CrossRef Google Scholar

Burger, B., Maffettone, P. M., Gusev, V. V., Aitchison, C. M., Bai, Y., Wang, X., … Cooper, A. I. (2020). A mobile robotic chemist. Nature, 583(7815), 237–241.CrossRef Google Scholar PubMed

Byers-Heinlein, K., Bergmann, C., Davies, C., Frank, M. C., Kiley Hamlin, J., Kline, M., … Soderstrom, M. (2020). Building a collaborative psychological science: Lessons learned from ManyBabies 1. Canadian Psychology/Psychologie Canadienne, 61(4), 349–363. https://doi.org/10.1037/cap0000216CrossRef Google Scholar PubMed

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T.-H., Huber, J., Johannesson, M., … Wu, H. (2018). Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644.CrossRef Google Scholar

Carter, E. C., Schönbrodt, F. D., Gervais, W. M., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2(2), 115–144.CrossRef Google Scholar

Cesario, J. (2014). Priming, replication, and the hardest science. Perspectives on Psychological Science, 9(1), 40–48. https://doi.org/10.1177/1745691613513470CrossRef Google Scholar PubMed

Cesario, J. (2022). What can experimental studies of bias tell us about real-world group disparities?. Behavioral and Brain Sciences, 45, E66. https://doi.org/10.1017/S0140525X21000017CrossRef Google Scholar

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112–130.CrossRef Google Scholar PubMed

Cohen, J. (1994). The earth is round (p<.05). The American Psychologist, 49(12), 997.CrossRef Google Scholar

Cooper, H., Hedges, L. V., & Valentine, J. C. (Eds.) (2019). The handbook of research synthesis and meta-analysis. Russell Sage Foundation.CrossRef Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.CrossRef Google Scholar PubMed

Debrouwere, S., & Rosseel, Y. (2022). The conceptual, cunning and conclusive experiment in psychology. Perspectives on Psychological Science, 17(3), 852–862. https://doi.org/10.1177/17456916211026947CrossRef Google Scholar PubMed

DeKay, M. L., Rubinchik, N., Li, Z., & De Boeck, P. (2022). Accelerating psychological science with metastudies: A demonstration using the risky-choice framing effect. Perspectives on Psychological Science, 17(6), 1704–1736. https://doi.org/10.1177/17456916221079611CrossRef Google Scholar PubMed

de Leeuw, J. R. (2015). JsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behavior Research Methods, 47(1), 1–12.CrossRef Google Scholar

de Leeuw, J. R., Motz, B. A., Fyfe, E. R., Carvalho, P. F., & Goldstone, R. L. (2022). Generalizability, transferability, and the practice-to-practice gap [Review of Generalizability, transferability, and the practice-to-practice gap]. The Behavioral and Brain Sciences, 45, e11.CrossRef Google Scholar PubMed

Devine, D. J., Clayton, L. D., Dunford, B. B., Seying, R., & Pryce, J. (2001). Jury decision making: 45 years of empirical research on deliberating groups. Psychology, Public Policy, and Law, 7(3), 622–727.CrossRef Google Scholar

Devine, D. J., & Philips, J. L. (2001). Do smarter teams o better: A meta-analysis of cognitive ability and team performance. Small Group Research, 32(5), 507–532.CrossRef Google Scholar

Dienes, Z. (2008). Understanding psychology as a science: An introduction to scientific and statistical inference. Macmillan.Google Scholar

Dubova, M., Moskvichev, A., & Zollman, K. (2022). Against theory-motivated experimentation in science. MetaArXiv. June 24. https://doi.org/10.31222/osf.io/ysv2uGoogle Scholar

Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., … Nosek, B. A. (2016). Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82.CrossRef Google Scholar

Ellemers, N., & Rink, F. (2016). Diversity in work groups. Current Opinion in Psychology, 11, 49–53.CrossRef Google Scholar

Engel, D., Woolley, A. W., Jing, L. X., Chabris, C. F., & Malone, T. W. (2014). Reading the mind in the eyes or reading between the lines? Theory of mind predicts collective intelligence equally well online and face-to-face. PLoS ONE, 9(12), e115212.CrossRef Google Scholar PubMed

Erev, I., Ert, E., Plonsky, O., Cohen, D., & Cohen, O. (2017). From anomalies to forecasts: Toward a descriptive model of decisions under risk, under ambiguity, and from experience. Psychological Review, 124(4), 369–409.CrossRef Google Scholar

Eyke, N. S., Green, W. H., & Jensen, K. F. (2020). Iterative experimental design based on active machine learning reduces the experimental burden associated with reaction screening. Reaction Chemistry & Engineering, 5(10), 1963–1972.CrossRef Google Scholar

Eyke, N. S., Koscher, B. A., & Jensen, K. F. (2021). Toward machine learning-enhanced high-throughput experimentation. Trends in Chemistry, 3(2), 120–132.CrossRef Google Scholar

Fehr, E., & Gachter, S. (2000). Cooperation and punishment in public goods experiments. The American Economic Review, 90(4), 980–994.CrossRef Google Scholar

Freese, J., & Peterson, D. (2017). Replication in social science. Annual Review of Sociology, 43, 147–165. https://doi.org/10.1146/annurev-soc-060116-053450CrossRef Google Scholar

Fyfe, E. R., de Leeuw, J. R., Carvalho, P. F., Goldstone, R. L., Sherman, J., Admiraal, D., … Motz, B. A. (2021). ManyClasses 1: Assessing the generalizable effect of immediate feedback versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science, 4(3), 25152459211027575.CrossRef Google Scholar

Gale, D., & Shapley, L. S. (1962). College admissions and the stability of marriage. The American Mathematical Monthly, 69(1), 9–15.CrossRef Google Scholar

Gelman, A. (2018). Don't characterize replications as successes or failures [Review of Don't characterize replications as successes or failures]. The Behavioral and Brain Sciences, 41, e128.CrossRef Google Scholar PubMed

Gelman, A., & Carlin, J. (2017). Some natural solutions to the p-value communication problem – and why they won't work. Journal of the American Statistical Association, 112(519), 899–901.CrossRef Google Scholar

Gelman, A., & Loken, E. (2014). The statistical crisis in science data-dependent analysis – a “garden of forking paths” – explains why many statistically significant comparisons don't hold up. American Scientist, 102(6), 460.CrossRef Google Scholar

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4(1), 1–58.CrossRef Google Scholar

Gongora, A. E., Xu, B., Perry, W., Okoye, C., Riley, P., Reyes, K. G., … Brown, K. A. (2020). A Bayesian experimental autonomous researcher for mechanical design. Science Advances, 6(15), eaaz1708.CrossRef Google Scholar PubMed

Goodman, J. K., Cryder, C. E., & Cheema, A. (2013). Data collection in a flat world: The strengths and weaknesses of Mechanical Turk samples: Data collection in a flat world. Journal of Behavioral Decision Making, 26(3), 213–224.CrossRef Google Scholar

Greenhill, S., Rana, S., Gupta, S., Vellanki, P., & Venkatesh, S. (2020). Bayesian optimization for adaptive experimental design: A review. IEEE Access, 8, 13937–13948.CrossRef Google Scholar

Griffiths, T. L. (2015). Manifesto for a new (computational) cognitive revolution. Cognition, 135, 21–23.CrossRef Google Scholar PubMed

Grubbs, J. B. (2022). The cost of crisis in clinical psychological science [Review of The cost of crisis in clinical psychological science]. The Behavioral and Brain Sciences, 45, e18.CrossRef Google Scholar PubMed

Hackman, J. R. (1968). Effects of task characteristics on group products. Journal of Experimental Social Psychology, 4(2), 162–187.CrossRef Google Scholar

Harkins, S. G. (1987). Social loafing and social facilitation. Journal of Experimental Social Psychology, 23(1), 1–18.CrossRef Google Scholar

Hartshorne, J. K., de Leeuw, J. R., Goodman, N. D., Jennings, M., & O'Donnell, T. J. (2019). A thousand studies for the price of one: Accelerating psychological science with Pushkin. Behavior Research Methods, 51(4), 1782–1803. https://doi.org/10.3758/s13428-018-1155-zCrossRef Google Scholar

Henrich, J., Heine, S., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and Brain Sciences, 33(2-3), 61–83. https://doi.org/10.1017/S0140525X0999152XCrossRef Google Scholar PubMed

Higgins, J. P. T., Thompson, S. G., Deeks, J. J., & Altman, D. G. (2003). Measuring inconsistency in meta-analyses. BMJ, 327(7414), 557–560.CrossRef Google Scholar PubMed

Hill, G. W. (1982). Group versus individual performance: Are N + 1 heads better than one? Psychological Bulletin, 91(3), 517–539.CrossRef Google Scholar

Hofman, J. M., Sharma, A., & Watts, D. J. (2017). Prediction and explanation in social systems. Science (New York, N.Y.), 355(6324), 486–488.CrossRef Google Scholar PubMed

Hofman, J. M., Watts, D. J., Athey, S., Garip, F., Griffiths, T. L., Kleinberg, J., … Yarkoni, T. (2021). Integrating explanation and prediction in computational social science. Nature, 595(7866), 181–188.CrossRef Google Scholar PubMed

Hofstede, G. (2016). Culture's consequences: Comparing values, behaviors, institutions, and organizations across nations (2nd ed.). Collegiate Aviation Review, 34(2), 108–109. Retrieved from https://www.proquest.com/scholarly-journals/cultures-consequences-comparing-values-behaviors/docview/1841323332/se-2 Google Scholar

Hong, L., & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. Proceedings of the National Academy of Sciences of the United States of America, 101(46), 16385–16389.CrossRef Google Scholar PubMed

Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399–425.CrossRef Google Scholar

Husband, R. W. (1940). Cooperative versus solitary problem solution. The Journal of Social Psychology, 11(2), 405–409.CrossRef Google Scholar

Inglehart, R., & Welzel, C. (2005). Modernization, cultural change, and democracy: The human development sequence. Cambridge University Press.Google Scholar

Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2(8), e124.CrossRef Google Scholar PubMed

Janis, I. L. (1972). Victims of groupthink: A psychological study of foreign-policy decisions and fiascoes (p. 277). Houghton Mifflin Company. https://psycnet.apa.org/fulltext/1975-29417-000.pdf Google Scholar

Jones, B. C., DeBruine, L. M., Flake, J. K., Liuzza, M. T., Antfolk, J., Arinze, N. C., … Coles, N. A. (2021). To which world regions does the valence-dominance model of social perception apply? Nature Human Behaviour, 5(1), 159–169.CrossRef Google Scholar PubMed

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., … Hassabis, D. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583–589.CrossRef Google Scholar PubMed

Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica: Journal of the Econometric Society, 47(2), 263–291.CrossRef Google Scholar

Karau, S. J., & Williams, K. D. (1993). Social loafing: A meta-analytic review and theoretical integration. Journal of Personality and Social Psychology, 65(4), 681–706.CrossRef Google Scholar

Kim, Y. J., Engel, D., Woolley, A. W., Lin, J. Y.-T., McArthur, N., & Malone, T. W. (2017). What makes a strong team?: Using collective intelligence to predict team performance in league of legends. Proceedings of the 2017 ACM conference on computer supported cooperative work and social computing – CSCW ’17 (pp. 2316–2329). New York, NY, USA.CrossRef Google Scholar

Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B., Bahník, Š., Bernstein, M. J., … Nosek, B. A. (2014). Investigating variation in replicability. Social Psychology, 45(3), 142–152.CrossRef Google Scholar

Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams, R. B., Alper, S., … Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across samples and settings. Advances in Methods and Practices in Psychological Science, 1(4), 443–490.CrossRef Google Scholar

Knudde, N., van der Herten, J., Dhaene, T., & Couckuyt, I. (2017). GPflowOpt: A Bayesian optimization library using TensorFlow. arXiv [stat.ML]. arXiv. http://arxiv.org/abs/1711.03845 Google Scholar

Koyré, A. (1953). An experiment in measurement. Proceedings of the American Philosophical Society, 97(2), 222–237.Google Scholar

Lakens, D., Uygun Tunç, D., & Necip Tunç, M. (2022). There is no generalizability crisis [Review of There is no generalizability crisis]. The Behavioral and Brain Sciences, 45, e25.CrossRef Google Scholar PubMed

Landy, J. F., Jia, M. L., Ding, I. L., Viganola, D., Tierney, W., Dreber, A., … Uhlmann, E. L. (2020). Crowdsourcing hypothesis tests: Making transparent how design choices shape research results. Psychological Bulletin, 146(5), 451–479.CrossRef Google Scholar PubMed

Larson, J. R. (2013). In search of synergy in small group performance. Psychology Press.CrossRef Google Scholar

Larson, S. D., & Martone, M. E. (2009). Ontologies for neuroscience: What are they and what are they good for? Frontiers in Neuroscience, 3(1), 60–67. https://doi.org/10.3389/neuro.01.007.2009CrossRef Google Scholar PubMed

Laughlin, P. R., Bonner, B. L., & Miner, A. G. (2002). Groups perform better than the best individuals on letters-to-numbers problems. Organizational Behavior and Human Decision Processes, 88(2), 605–620.CrossRef Google Scholar

Lei, B., Kirk, T. Q., Bhattacharya, A., Pati, D., Qian, X., Arroyave, R., & Mallick, B. K. (2021). Bayesian optimization with adaptive surrogate models for automated experimental design. NPJ Computational Materials, 7(1), 1–12.CrossRef Google Scholar

LePine, J. A. (2003). Team adaptation and postchange performance: Effects of team composition in terms of members’ cognitive ability and personality. The Journal of Applied Psychology, 88(1), 27–39.CrossRef Google Scholar PubMed

Letham, B., Karrer, B., Ottoni, G., & Bakshy, E. (2019). Constrained Bayesian optimization with noisy experiments. Bayesian Analysis, 14(2), 495–519. https://doi.org/10.1214/18-ba1110CrossRef Google Scholar

Levinthal, D. A., & Rosenkopf, L. (2021). Commensurability and collective impact in strategic management research: When non-replicability is a feature, not a bug. Working-paper (unpublished preprint). https://mackinstitute.wharton.upenn.edu/2020/commensurability-and-collective-impact-in-strategic-management-research/Google Scholar

Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? The Journal of Economic Perspectives: A Journal of the American Economic Association, 21(2), 153–174.CrossRef Google Scholar

Li, W., Germine, L. T., Mehr, S. A., Srinivasan, M., & Hartshorne, J. (2022). Developmental psychologists should adopt citizen science to improve generalization and reproducibility. Infant and Child Development, e2348. https://doi.org/10.1002/icd.2348CrossRef Google Scholar

Litman, L., Robinson, J., & Abberbock, T. (2017). TurkPrime.com: A versatile crowdsourcing data acquisition platform for the behavioral sciences. Behavior Research Methods, 49(2), 433–442.CrossRef Google Scholar PubMed

MacWhinney, B. (2014). The childes project: Tools for analyzing talk, volume II: The database (3rd ed.). Psychology Press. https://doi.org/10.4324/9781315805641CrossRef Google Scholar

Maier, M., Bartoš, F., Stanley, T. D., Shanks, D. R., Harris, A. J. L., & Wagenmakers, E.-J. (2022). No evidence for nudging after adjusting for publication bias. Proceedings of the National Academy of Sciences of the United States of America, 119(31), e2200300119.CrossRef Google Scholar PubMed

ManyBabies Consortium. (2020). Quantifying sources of variability in infancy research using the infant-directed-speech preference. Advances in Methods and Practices in Psychological Science, 3(1), 24–52.CrossRef Google Scholar

Manzi, J. (2012). Uncontrolled: The surprising payoff of trial-and-error for business, politics, and society (pp. 1–320). Basic Books.Google Scholar

Mao, A., Mason, W., Suri, S., & Watts, D. J. (2016). An experimental study of team size and performance on a complex task. PLoS ONE, 11(4), e0153048.CrossRef Google Scholar PubMed

Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J. (2016). Exploring limits to prediction in complex social systems. In Proceedings of the 25th international conference on world wide web no. 978-1-4503-4143-1 (pp. 683–694). Republic and Canton of Geneva, CHE. International World Wide Web Conferences Steering Committee.CrossRef Google Scholar

Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon's Mechanical Turk. Behavior Research Methods, 44(1), 1–23.CrossRef Google Scholar PubMed

Mason, W., & Watts, D. J. (2012). Collaborative learning in networks. Proceedings of the National Academy of Sciences of the United States of America, 109(3), 764–769.CrossRef Google Scholar PubMed

McClelland, G. H. (1997). Optimal design in psychological research. Psychological Methods, 2(1), 3–19.CrossRef Google Scholar

McGrath, J. E. (1984). Groups: Interaction and performance. Prentice Hall.Google Scholar

Meehl, P. E. (1967). Theory-testing in psychology and physics: A methodological paradox. Philosophy of Science, 34(2), 103–115.CrossRef Google Scholar

Meehl, P. E. (1990a). Why summaries of research on psychological theories are often uninterpretable. Psychological Reports, 66(1), 195–244.CrossRef Google Scholar

Meehl, P. E. (1990b). Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1(2), 108–141.CrossRef Google Scholar

Mertens, S., Herberz, M., Hahnel, U. J. J., & Brosch, T. (2022). The effectiveness of nudging: A meta-analysis of choice architecture interventions across behavioral domains. Proceedings of the National Academy of Sciences of the United States of America, 119(1). https://doi.org/10.1073/pnas.2107346118Google Scholar PubMed

Merton, R. K. (1968). On sociological theories of the middle range. Social Theory and Social Structure, 39–72.Google Scholar

Milkman, K. L., Gandhi, L., Patel, M. S., Graci, H. N., Gromet, D. M., Ho, H., … Duckworth, A. L. (2022). A 680,000-person megastudy of nudges to encourage vaccination in pharmacies. Proceedings of the National Academy of Sciences of the United States of America, 119(6). https://doi.org/10.1073/pnas.2115126119Google Scholar PubMed

Milkman, K. L., Patel, M. S., Gandhi, L., Graci, H. N., Gromet, D. M., Ho, H., … Duckworth, A. L. (2021). A megastudy of text-based nudges encouraging patients to get vaccinated at an upcoming doctor's appointment. Proceedings of the National Academy of Sciences of the United States of America, 118(20), e2101165118.CrossRef Google Scholar PubMed

Mook, D. G. (1983). In defense of external invalidity. The American Psychologist, 38(4), 379–387.CrossRef Google Scholar

Moshontz, H., Campbell, L., Ebersole, C. R., IJzerman, H., Urry, H. L., Forscher, P. S., … Chartier, C. R. (2018). The psychological science accelerator: Advancing psychology through a distributed collaborative network. Advances in Methods and Practices in Psychological Science, 1(4), 501–515.CrossRef Google Scholar PubMed

Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., … Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 21.CrossRef Google Scholar PubMed

Muthukrishna, M., Bell, A. V., Henrich, J., Curtin, C. M., Gedranovich, A., McInerney, J., & Thue, B. (2020). Beyond western, educated, industrial, rich, and democratic (WEIRD) psychology: Measuring and mapping scales of cultural and psychological distance. Psychological Science, 31(6), 678–701.CrossRef Google Scholar PubMed

Muthukrishna, M., & Henrich, J. A. (2019). A problem in theory. Nature Human Behaviour, 3, 221–229. https://doi.org/10.1038/s41562-018-0522-1CrossRef Google Scholar PubMed

Myerson, R. B. (1981). Optimal auction design. Mathematics of Operations Research, 6(1), 58–73.CrossRef Google Scholar

National Information Standards Organization. (2022). ANSI/NISO Z39. 104-2022, CRediT, contributor roles taxonomy. [S. L.]. National Information Standards Organization. https://www.niso.org/publications/z39104-2022-credit Google Scholar

National Science Foundation. (2022). NSF budget requests to congress and annual appropriations. National Science Foundation. https://www.nsf.gov/about/budget/Google Scholar

Nemesure, M. D., Heinz, M. V., Huang, R., & Jacobson, N. C. (2021). Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence. Scientific Reports, 11(1), 1980.CrossRef Google Scholar

Newell, A. (1973). You can't play 20 questions with nature and win: Projective comments on the papers of this symposium. http://shelf2.library.cmu.edu/Tech/240474311.pdf Google Scholar

Open Science Collaboration. (2015). PSYCHOLOGY. Estimating the reproducibility of psychological science. Science (New York, N.Y.), 349(6251), aac4716.CrossRef Google Scholar

Page, S. E. (2008). The difference: How the power of diversity creates better groups, firms, schools, and societies – New edition. Princeton University Press.CrossRef Google Scholar

Palan, S., & Schitter, C. (2018). Prolific.ac – A subject pool for online experiments. Journal of Behavioral and Experimental Finance, 17, 22–27.CrossRef Google Scholar

Peterson, J. C., Bourgin, D. D., Agrawal, M., Reichman, D., & Griffiths, T. L. (2021). Using large-scale experiments and machine learning to discover theories of human decision-making. Science (New York, N.Y.), 372(6547), 1209–1214.CrossRef Google Scholar PubMed

Plonsky, O., Apel, R., Ert, E., Tennenholtz, M., Bourgin, D., Peterson, J. C., … Erev, I. (2019). Predicting human decisions with behavioral theories and machine learning. arXiv [cs.AI]. arXiv. http://arxiv.org/abs/1904.06866 Google Scholar

Preckel, F., & Brunner, M. (2017). Nomological nets. Encyclopedia of Personality and Individual Differences, 1–4. https://doi.org/10.1007/978-3-319-28099-8_1334-1CrossRef Google Scholar

Ren, P., Xiao, Y., Chang, X., Huang, P.-Y., Li, Z., Gupta, B. B., … Wang, X. (2021). A survey of deep active learning. ACM Computing Surveys, 54(9), 1–40.Google Scholar

Reuss, H., Kiesel, A., & Kunde, W. (2015). Adjustments of response speed and accuracy to unconscious cues. Cognition, 134, 57–62.CrossRef Google Scholar PubMed

Richard Hackman, J., & Morris, C. G. (1975). Group tasks, group interaction process, and group performance effectiveness: A review and proposed integration. In Berkowitz, L. (Ed.), Advances in Experimental Social Psychology (Vol. 8, pp. 45–99). Academic Press. https://doi.org/10.1016/s0065-2601(08)60248-8Google Scholar

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638–641.CrossRef Google Scholar

Rubin, D. L., Lewis, S. E., Mungall, C. J., Misra, S., Westerfield, M., Ashburner, M., … Musen, M. A. (2006). National center for biomedical ontology: Advancing biomedicine through structured organization of scientific knowledge. OMICS: A Journal of Integrative Biology, 10(2), 185–198. https://doi.org/10.1089/omi.2006.10.185CrossRef Google Scholar PubMed

Schneid, M., Isidor, R., Li, C., & Kabst, R. (2015). The influence of cultural context on the relationship between gender diversity and team performance: A meta-analysis. The International Journal of Human Resource Management, 26(6), 733–756.CrossRef Google Scholar

Schulz-Hardt, S., & Mojzisch, A. (2012). How to achieve synergy in group decision making: Lessons to be learned from the hidden profile paradigm. European Review of Social Psychology, 23(1), 305–343.CrossRef Google Scholar

Schwartz, S. (2006). A theory of cultural value orientations: Explication and applications. Comparative Sociology, 5(2–3), 137–182.CrossRef Google Scholar

Settles, B. (2011). From theories to queries: Active learning in practice. In Guyon, I., Cawley, G., Dror, G., Lemaire, V., & Statnikov, A. (Eds.), Active learning and experimental design workshop in conjunction with AISTATS 2010 (Vol. 16, pp. 1–18). PMLR.Google Scholar

Shallue, C. J., & Vanderburg, A. (2018). Identifying exoplanets with deep learning: A five-planet resonant chain around Kepler-80 and an eighth planet around Kepler-90. AJS; American Journal of Sociology, 155(2), 94.Google Scholar

Shaw, M. E. (1963). Scaling group tasks: A method for dimensional analysis. https://apps.dtic.mil/sti/pdfs/AD0415033.pdf Google Scholar

Shields, B. J., Stevens, J., Li, J., Parasram, M., Damani, F., Alvarado, J. I. M., … Doyle, A. G. (2021). Bayesian reaction optimization as a tool for chemical synthesis. Nature, 590(7844), 89–96.CrossRef Google Scholar PubMed

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366.CrossRef Google Scholar PubMed

Simons, D. J., Shoda, Y., & Lindsay, D. S. (2017). Constraints on generality (COG): A proposed addition to all empirical papers. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(6), 1123–1128.CrossRef Google Scholar PubMed

Simonsohn, U., Simmons, J., & Nelson, L. D. (2022). Above averaging in literature reviews. Nature Reviews Psychology, 1(10), 551–552.CrossRef Google Scholar

Smucker, B., Krzywinski, M., & Altman, N. (2018). Optimal experimental design. Nature Methods, 15(8), 559–560.CrossRef Google Scholar PubMed

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms. arXiv [stat.ML]. arXiv. http://arxiv.org/abs/1206.2944 Google Scholar

Steiner, I. D. (1972). Group process and productivity. Academic Press.Google Scholar

Stewart, G. L. (2006). A meta-analytic review of relationships between team design features and team performance. Journal of Management, 32(1), 29–55.CrossRef Google Scholar

Stokes, D. E. (1997). Pasteur's quadrant: Basic science and technological innovation. Brookings Institution Press.Google Scholar

Szaszi, B., Higney, A., Charlton, A., Gelman, A., Ziano, I., Aczel, B., … Tipton, E. (2022). No reason to expect large and consistent effects of nudge interventions [Review of No reason to expect large and consistent effects of nudge interventions]. Proceedings of the National Academy of Sciences of the United States of America, 119(31), e2200732119.CrossRef Google Scholar PubMed

Tasca, G. A. (2021). Team cognition and reflective functioning: A review and search for synergy. Group Dynamics: Theory, Research, and Practice, 25(3), 258–270.CrossRef Google Scholar

Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3–4), 285–294.CrossRef Google Scholar

Turner, J. A., & Laird, A. R. (2012). The cognitive paradigm ontology: Design and application. Neuroinformatics, 10(1), 57–66.CrossRef Google Scholar PubMed

Turner, M. A., & Smaldino, P. E. (2022). Mechanistic modeling for the masses [Review of Mechanistic modeling for the masses]. The Behavioral and Brain Sciences, 45, e33.CrossRef Google Scholar PubMed

Uhlmann, E. L., Ebersole, C. R., Chartier, C. R., Errington, T. M., Kidwell, M. C., Lai, C. K., … Nosek, B. A. (2019). Scientific utopia III: Crowdsourcing science. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 14(5), 711–733.CrossRef Google Scholar PubMed

Van Bavel, J. J., Mende-Siedlecki, P., Brady, W. J., & Reinero, D. A. (2016). Contextual sensitivity in scientific reproducibility. Proceedings of the National Academy of Sciences of the United States of America, 113(23), 6454–6459.CrossRef Google Scholar PubMed

Vickrey, W. (1961). Counterspeculation, auctions, and competitive sealed tenders. The Journal of Finance, 16(1), 8–37.CrossRef Google Scholar

Voelkel, J. G., Stagnaro, M. N., Chu, J., Pink, S. L., Mernyk, J. S., Redekopp, C., … Willer, R. (2022). Megastudy identifying successful interventions to strengthen Americans’ democratic attitudes. Preprint. https://doi.org/10.31219/osf.io/y79u5CrossRef Google Scholar

Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228–1242.CrossRef Google Scholar

Watson, G. B. (1928). Do groups think more efficiently than individuals? Journal of Abnormal and Social Psychology, 23(3), 328.CrossRef Google Scholar

Watts, D. (2017). Response to Turco and Zuckerman's “Verstehen for sociology.” The American Journal of Sociology, 122(4), 1292–1299.CrossRef Google Scholar

Watts, D. J. (2011). Everything is obvious*: Once you know the answer. Crown Business.Google Scholar

Watts, D. J. (2014). Common sense and sociological explanations. The American Journal of Sociology, 120(2), 313–351.CrossRef Google Scholar PubMed

Watts, D. J. (2017). Should social science be more solution-oriented? Nature Human Behaviour, 1, 15.CrossRef Google Scholar

Watts, D. J., Beck, E. D., Bienenstock, E. J., Bowers, J., Frank, A., Grubesic, A., … Salganik, M. (2018). Explanation, prediction, and causality: Three sides of the same coin? https://doi.org/10.31219/osf.io/u6vz5CrossRef Google Scholar

Wiernik, B. M., Raghavan, M., Allan, T., & Denison, A. J. (2022). Generalizability challenges in applied psychological and organizational research and practice [Review of Generalizability challenges in applied psychological and organizational research and practice]. The Behavioral and Brain Sciences, 45, e38.CrossRef Google Scholar PubMed

Witkop, G. (n.d.). Systematizing confidence in open research and evidence (SCORE). DARPA. Retrieved June 22, 2022, from https://www.darpa.mil/program/systematizing-confidence-in-open-research-and-evidence Google Scholar

Wood, R. E. (1986). Task complexity: Definition of the construct. Organizational Behavior and Human Decision Processes, 37(1), 60–82.CrossRef Google Scholar

Woolley, A. W., Chabris, C. F., Pentland, A., Hashmi, N., & Malone, T. W. (2010). Evidence for a collective intelligence factor in the performance of human groups. Science (New York, N.Y.), 330(6004), 686–688.CrossRef Google Scholar PubMed

Wurman, P. R., Wellman, M. P., & Walsh, W. E. (2001). A parametrization of the auction design space. Games and Economic Behavior, 35(1), 304–338.CrossRef Google Scholar

Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, E1. https://doi.org/10.1017/S0140525X20001685CrossRef Google Scholar

Yarkoni, T., Eckles, D., Heathers, J., Levenstein, M., Smaldino, P. E., & Lane, J. I. (2019). Enhancing and accelerating social science via automation: Challenges and opportunities. https://doi.org/10.31235/osf.io/vncweCrossRef Google Scholar

Yarkoni, T., & Westfall, J. (2017). Choosing prediction over explanation in psychology: Lessons from machine learning. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 12(6), 1100–1122.CrossRef Google Scholar PubMed