P-CURVING AS A SAFEGUARD AGAINST P-HACKING IN SLA RESEARCH: A CASE STUDY

Seth Lindstromberg

doi:10.1017/S0272263121000516

P-CURVING AS A SAFEGUARD AGAINST P-HACKING IN SLA RESEARCH

A CASE STUDY

Published online by Cambridge University Press: 06 September 2021

Seth Lindstromberg

Show author details

Seth Lindstromberg*: Affiliation:
Hilderstone College
*: *Correspondence concerning this article should be addressed to Seth Lindstromberg, Hilderstone College, St Peters Road, Broadstairs, Kent, CT10 2JW, United Kingdom. Email: lindstromberg@gmail.com; sethl@hilderstone.ac.uk

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

It is important to be able to identify research results likely to have been arrived at by means of “p-hacking,” a common term for research and reporting practices (such as the selective reporting of results) that are biased toward finding p < α. This paper discusses and demonstrates “p-curving,” a means of checking a set of primary studies within a specific research stream for signs of p-hacking. A salient feature of p-curving is that it is based entirely on significant p-values. Because of the potential usefulness of p-curving and because it has been little used by SLA researchers, a case study illustrates the construction and analysis of a p-curve as a complement to meta-analysis. The focal p-curve in this study relates to published (quasi)experimental studies that addressed the research hypothesis that for low and middle proficiency learners L1 glosses facilitate vocabulary learning during reading better than L2 glosses do.

Type: Methods Forum
Information: Studies in Second Language Acquisition , Volume 44 , Issue 4 , September 2022 , pp. 1155 - 1180

DOI: https://doi.org/10.1017/S0272263121000516 [Opens in a new window]
Copyright: © The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

I am grateful to the lead authors of Kim et al. (2020) and Yanagisawa et al. (2020) for answering questions and to the lead author of Kim et al. for supplying two papers that I could find no other way of obtaining. At various stages of its development this paper benefited immensely from comments, suggestions, and corrections from Frank Boers, Tessa Woodward, three anonymous reviewers, and an editor.

References

Primary Studies Included in Case Study

Arpaci, D. (2016). The effects of accessing L1 versus L2 definitional glosses on L2 learners’ reading comprehension and vocabulary learning. Eurasian Journal of Applied Linguistics, 2, 15–29. https://www.ejal.info/index.php/ejal/issue/view/10 CrossRef Google Scholar

Ertürk, Z. (2016). The effect of glossing on EFL learners’ incidental vocabulary learning in reading. Procedia: Social and Behavioral Sciences, 232, 373–381. https://doi.org/10.1016/j.sbspro.2016.10.052 Google Scholar

Farvardin, M., & Biria, R. (2012). The impact of gloss types on Iranian EFL students’ reading comprehension and lexical retention. International Journal of Instruction, 5, 99–114. http://www.e-iji.net/volumes/317-january-2012-volume-5-number-1 Google Scholar

Ko, M.-H. (1995). Glossing in incidental and intentional learning of foreign language vocabulary and reading. University of Hawai’i Working Papers in ESL, 13, 49–94. https://scholarspace.manoa.hawaii.edu/handle/10125/40761 Google Scholar

Ko, M.-H. (2017). The relationship between gloss type and L2 proficiency in incidental vocabulary learning. Modern English Education, 18, 47–69. http://www.dbpia.co.kr/Article/NODE07255877 CrossRef Google Scholar

Mitarai, Y., & Aizawa, K. (1999). The effects of different types of glosses in vocabulary learning and reading comprehension. ARELE: Annual Review of English Language Education in Japan, 10, 73–82.Google Scholar

Öztürk, M., & Yorgancı, M. (2017). Effects of L1 and L2 glosses on incidental vocabulary learning of EFL prep students. Turkish Studies: International Periodical for the Languages , Literature and History of Turkish or Turkic, 12, 635–656. http://doi.org/10.7827/TurkishStudies.11432 Google Scholar

Pishghadam, R., & Ghahari, S. (2011). The impact of glossing on incidental vocabulary learning: A comparative study. Iranian EFL Journal, 7, 8–29. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.965.6528&rep=rep1&type=pdf Google Scholar

Shiki, O. (2008). Effects of glosses on incidental vocabulary learning: Which gloss-type works better, L1, L2, single choice, or multiple choices for Japanese university students? Journal of Inquiry and Research, 87, 39–56. http://doi.org/10.18956/00006209 Google Scholar

Yoshii, M. (2006). L1 and L2 glosses: Their effects of incidental vocabulary learning. Language Learning & Technology, 10, 85–101. https://www.lltjournal.org/item/2563 Google Scholar

References

Anscombe, F. (1973). Graphs in statistical analysis. The American Statistician, 27, 17–21. https://doi.org/10.2307/2682899 Google Scholar

Arnholt, A. (2017). R package BSDA (Basic statistics and data analysis), Version 1.20. (Computer freeware). https://www.rdocumentation.org/packages/BSDA/versions/1.2.0 Google Scholar

Bakker, M., & Wicherts, J. (2011). The (mis) reporting of statistical results in psychology journals. Behavior Research Methods, 43, 666–678. https://doi.org/10.3758/s13428-011-0089-5 CrossRef Google Scholar PubMed

Bakker, M., & Wicherts, J. (2014). Outlier removal, sum scores, and the inflation of the type I error rate in independent samples t tests: The power of alternatives and recommendations. Psychological Methods, 19, 409–427. https://doi.org/10.1037/met0000014 CrossRef Google Scholar

Barcroft, J. (2015). Lexical input processing and vocabulary learning. John Benjamins.CrossRef Google Scholar

Bishop, D., & Thompson, P. (2016). Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ, 4:e1715. https://doi.org/10.7717/peerj.1715 CrossRef Google Scholar PubMed

BITSS (Berkeley Initiative for Transparency in the Social Sciences). (2017). P-curve: A tool for detecting publication bias. https://www.bitss.org/p-curve-a-tool-for-detecting-publication-bias/ Google Scholar

Boers, F. (in press). Glossing and vocabulary learning. Language Teaching. https://doi.org/10.1017/S0261444821000252 CrossRef Google Scholar

Borenstein, M., Hedges, L., Higgins, J., & Rothstein, H. (2009). Introduction to meta-analysis. Wiley.CrossRef Google Scholar

Carter, E., Schönbrodt, F., Gervais, W., & Hilgard, J. (2019). Correcting for bias in psychology: A comparison of meta-analytic methods. Advances in Methods and Practices in Psychological Science, 2, 115–144. https://doi.org/10.1177/2515245919847196 CrossRef Google Scholar

Chambers, C. (2017). The seven deadly sins of psychology: A manifesto for reforming the culture of scientific practice. Princeton University Press.Google Scholar

Coburn, K., & Vevea, J. (2016). weightr: Estimating weight-function models for publication, Version 2.0.2. (Computer freeware). https://CRAN.R-project.org/package=weightr Google Scholar

Dick, A., Garcia, N., Pruden, S., Thompson, W., Hawes, S., Sutherland, M., Riedel, M., Laird, A., & Gonzalez, R. (2019). No evidence for a bilingual executive function advantage in the ABCD study. Nature Human Behavior, 3, 692–701. https://doi.org/10.1038/s41562-019-0609-3 CrossRef Google Scholar PubMed

Duval, S., & Tweedie, R. (2000). Trim and fill: A simple funnel-plot–based method of testing and adjusting for publication bias in metaanalysis. Biometrics, 56, 455–463. https://doi.org/10.1111/j.0006-341X.2000.00455.x CrossRef Google Scholar

Egger, M., Smith, G., Schneider, M., & Minder, C. (1997). Bias in meta-analysis detected by a simple, graphical test. British Medical Journal, 315, 629–634. https://doi.org/10.1136/bmj.315.7109.629 CrossRef Google Scholar PubMed

Erdfelder, E., & Heck, D. (2019). P-curve: A word of caution. Zeitschrift für Psychologie, 227, 249–260. https://doi.org/10.1027/a000001 CrossRef Google Scholar

Fidler, F., & Wilcox, J. (2018). Reproducibility of scientific results. Stanford encyclopedia of philosophy . Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/entries/scientific-reproducibility/ Google Scholar

Gelman, A. (2018). The p-curve, p-uniform, and Hedges (1984). Methods for meta-analysis under p-hacking: An exchange with Blake McShane, Uri Simosohn, and Marcel van Assen. Stat modeling, causal inference, and social science, 26 February. https://statmodeling.stat.columbia.edu/2018/02/26/p-curve-p-uniform-hedges-1984-methods-meta-analysis-selection-bias-exchange-blake-mcshane-uri-simosohn/ Google Scholar

Gelman, A., & Loken, E. (2013). The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Department of Statistics, Columbia University. http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf Google Scholar

Hartgerink, C. (2017). Reanalyzing Head et al. (2015): Investigating the robustness of widespread p-hacking. PeerJ Preprints, 5, e3068. https://doi.org/10.7717/peerj.3068 CrossRef Google Scholar PubMed

Head, M., Holman, L., Lanfear, R., Kahn, A., & Jennions, M. (2015). The extent and consequences of p-hacking in science. PLOS Biology, 13, e1002106. https://doi.org/10.1371/journal.pbio.1002106 CrossRef Google Scholar

Hedges, L. (1992). Modeling publication selection effects in meta-analysis. Statistical Science, 7, 246–255. https://projecteuclid.org/euclid.ss/1177011364 CrossRef Google Scholar

John, L., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of questionable research practices with incentives for truth telling. Psychological Science, 23, 524–532. https://doi.org/10.1177/0956797611430953 CrossRef Google Scholar PubMed

Kazerouni, Z., & Rassaei, E. (2016). The effects of L1 and L2 glossing on the retention of L2 vocabulary in intentional and incidental settings. Journal of Studies in Learning and Teaching English, 5, 119–150. http://jslte.iaushiraz.ac.ir/issue_112611_112618.html Google Scholar

Kim, H., Lee, J., & Lee, H. (2020). The relative effects of L1 and L2 glosses on L2 learning: A meta-analysis. Language Teaching Research. Advance view. https://doi.org/10.1177/1362168820981394 CrossRef Google Scholar

Lakens, D. (2014). What p-hacking really looks like: A comment on Masicampo & Lalande. (2012). Quarterly Journal of Experimental Psychology A, 68, 829–832. https://doi.org/10.1080/17470218.2014.982664 CrossRef Google Scholar PubMed

Lakens, D. (2018). Professors are not elderly: Evaluating the evidential value of two social priming effects through p-curve analyses. Eindhoven University of Technology. https://psyarxiv.com/3m5y9/ Google Scholar

Lakens, D. (2021). Sample size justification. PsyArXiv https://psyarxiv.com/9d3yf/ Google Scholar

Lakens, D., Scheel, A., & Isager, P. (2018). Equivalence testing for psychological research. Advances in Methods and Practices in Psychological Science, 1, 1259–69. https://doi.org/10.1177/2515245918770963 CrossRef Google Scholar

Light, R., & Pillemer, D. (1984). Summing up: The science of reviewing research. Harvard University Press.CrossRef Google Scholar

Linck, J., & Cunnings, J. (2015). The utility and application of mixed-effects models in second language research. Language Learning, 65, 185–207. https://doi.org/10.1111/lang.12117 CrossRef Google Scholar

Lindstromberg, S. (2016). Inferential statistics in Language Teaching Research: A review and ways forward. Language Teaching Research, 20, 741–768. https://doi.org/10.1177/1362168816649979 CrossRef Google Scholar

McShane, B., Böckenholt, U., & Hansen, K. (2016). Adjusting for publication bias in meta-analysis: An evaluation of selection methods and some cautionary notes. Perspectives on Psychological Science, 11, 730–749. https://doi.org/10.1177/1745691616662243 CrossRef Google Scholar PubMed

Norris, J. (2015). Statistical significance testing in second language research: Basic problems and suggestions for reform. Language Learning, 65, 97–126. https://doi.org/10.1111/lang.12114 CrossRef Google Scholar

Plonsky, L. (2013). Study quality in SLA: An assessment of designs, analyses, and reporting practices in quantitative L2 research. Studies in Second Language Acquisition, 35, 655–687. https://doi.org/10.1017/S0272263113000399 CrossRef Google Scholar

Plonsky, L., & Gass, S. (2011). Quantitative research methods, study quality, and outcomes: The case of interaction research. Language Learning, 61, 325–366. https://doi.org/10.1111/j.1467-9922.2011.00640.x CrossRef Google Scholar

Plonsky, L. & Oswald, F. (2014). How big is “big”? Interpreting effect sizes in L2 research. Language Learning, 64, 878–912. https://doi.org/10.1111/lang.12079 CrossRef Google Scholar

Plonsky, L., Sudina, E., & Hu, Y. (2021). Applying meta-analysis to research on bilingualism: An introduction. Bilingualism: Language and Cognition. Advance online publication. https://doi.org/10.1017/S1366728920000760 CrossRef Google Scholar

Pollet, T., & van der Meij, L. (2017). To remove or not to remove: The impact of outlier handling on significance testing in testosterone data. Adaptive Human Behavior and Physiology, 3, 43–60. https://doi.org/10.1007/s40750-016-0050-z CrossRef Google Scholar

Roettger, T. (2019). Researcher degrees of freedom in phonetic research. Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10, 1–27. https://doi.org/10.5334/labphon.147 CrossRef Google Scholar

Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86, 638–641. https://doi.org/10.1037/0033-2909.86.3.638 CrossRef Google Scholar

Rothstein, H., Sutton, A., & Borenstein, M., Eds. (2005). Publication bias in meta‐analysis: Prevention, assessment and adjustments. Wiley.CrossRef Google Scholar

Simmons, J., Nelson, L., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366. http://doi.org/10.1177/0956797611417632 CrossRef Google Scholar PubMed

Simonsohn, U., Nelson, L., & Simmons, J. (2014a). P-curve: A key to the file drawer. Journal of Experimental Psychology: General, 143, 534–547. http://doi.org/10.1037/a0033242 CrossRef Google Scholar

Simonsohn, U., & Nelson, L., & Simmons, J. (2014b). P-curve and effect size: Correcting for publication bias using only significant results. Perspectives on Psychological Science, 9, 666–681. https://doi.org/10.1177/1745691614553988 CrossRef Google Scholar

Simonsohn, U., Simmons, J., & Nelson, L. (2015). Better p-curves: Making p-curve analysis more robust to errors, fraud, and ambitious p-hacking, A reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144, 1146–1152. http://doi.org/10.1037/xge0000104 CrossRef Google Scholar

Simonsohn, U., Nelson, L., & Simmons, J. (2017). P-curve app 4.06. (Computer freeware). http://www.p-curve.com/app4/ Google Scholar

Simonsohn, U., Nelson, L., & Simmons, J. (2019). P-curve won’t do your laundry, but it will distinguish replicable from non-replicable findings in observational research: Comment on Bruns & Ioannidis (2016). PLoS ONE 14, e0213454. https://doi.org/10.1371/journal.pone.0213454 CrossRef Google Scholar

van Aert, R. (2021). puniform: Meta-analysis methods correcting for publication bias, Version 0.2.4. (Computer freeware). https://github.com/RobbievanAert/puniform Google Scholar

van Aert, R., & van Assen, M. (2021). Correcting for publication bias in a meta-analysis with the p-uniform* method. Open Science Framework. https://doi.org/10.31222/osf.io/zqjr9 CrossRef Google Scholar

van Assen, M., van Aert, R., & Wicherts, J. (2015). Meta-analysis using effect size distributions of only statistically significant studies. Psychological Methods, 20, 293–309. http://doi.org/10.1037/met0000025 CrossRef Google Scholar PubMed

van Aert, R., Wicherts, J., & van Assen, M. (2019). Publication bias examined in meta-analyses from psychology and medicine: A meta-meta-analysis. PLoS ONE, 14, e0215052. https://doi.org/10.1371/journal.pone.0215052 CrossRef Google Scholar PubMed

Vitta, J., & Al-Hoorie, A. (2020). The flipped classroom in second language learning: A meta-analysis. Language Teaching Research. Advance view. https://doi.org/10.1177/1362168820981403 Google Scholar

Vogel, D., & Homberg, F. (2020). P‐hacking, p‐curves, and the PSM–performance relationship: Is there evidential value? Public Administration Review, 81, 191–204. http://doi.org/10.1111/puar.13273 CrossRef Google Scholar

Westfall, J. (2016). Five different “Cohen’s d” statistics for within-subject designs. Cookie Scientist: Designing experiments and analyzing data. 25 March. http://jakewestfall.org/blog/index.php/2016/03/25/five-different-cohens-d-statistics-for-within-subject-designs/ Google Scholar

Wicherts, J., Veldkamp, C., Augusteijn, H., Bakker, M., van Aert, , Robbie, M., & van Assen, M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking, Frontiers in Psychology, 7, 1832. https://www.frontiersin.org/article/10.3389/fpsyg.2016.01832 CrossRef Google Scholar PubMed

Yanagisawa, A., Webb, S., & Uchihara, T. (2020). How do different forms of glossing contribute to L2 vocabulary learning from reading? A meta-regression analysis. Studies in Second Language Acquisition, 42, 411–438. https://doi.org/10.1017/S0272263119000688 CrossRef Google Scholar

Lindstromberg supplementary material

File 47.3 KB

Article contents

P-CURVING AS A SAFEGUARD AGAINST P-HACKING IN SLA RESEARCH

Abstract

Access options

Footnotes

References

Primary Studies Included in Case Study

References

Lindstromberg supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests