Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-45l2p Total loading time: 0 Render date: 2024-04-25T13:14:31.093Z Has data issue: false hasContentIssue false

3 - A Systematic Review of Argument-Based Validation Studies in the Field of Language Testing (2000–2018)

from Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Published online by Cambridge University Press:  14 January 2021

Carol A. Chapelle
Affiliation:
Iowa State University
Erik Voss
Affiliation:
Teachers College, Columbia University
Get access

Summary

Since the publication of Kane (2006) on argument-based validation and the validation project by Chapelle, Enright, and Jamieson (2008), a trend of employing argument-based approach in language testing validation research has emerged as observed by Chapelle and Voss (2013). To better understand this recent trend, this systematic review study identified and analyzed the argument-based validation studies published from 2000 to 2018. A comprehensive literature search was conducted with multiple search terms (e.g., validity, argument-based validation, inferences, etc.) on a variety of research publication sources, including peer-reviewed academic journals, research reports, and dissertations. After applying pre-established inclusion criteria, 70 studies were retained, including 45 journal articles or research reports and 25 doctoral dissertations. The claims and inferences employed in these studies were analyzed into themes and categorized under Chapelle, Enright, and Jamieson (2008)’s framework. In addition, the research methodology addressing the warrants, rebuttals and backing in each study was documented and reviewed. Based on the results of this analysis, we make suggestions about constructing interpretation and use arguments as well as evaluating the coherence and plausibility of the validity arguments in various testing contexts.

Type
Chapter
Information
Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 45 - 70
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241259. https://doi.org/10.1177/0265532213509810CrossRefGoogle Scholar
Barkaoui, K. (2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. ETS Research Report Series, (1), 142. https://doi.org/10.1002/ets2.12050CrossRefGoogle Scholar
Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores. Assessing Writing, 46, 1931. https://doi.org/10.1016/j.asw.2018.02.005Google Scholar
Becker, A. P. (2011). Building evidence for the evaluation of English learners’ writing scores. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Becker, A. (2018). Not to scale? An argument-based inquiry into the validity of an L2 writing rating scale. Assessing Writing, 37, 112. https://doi.org/10.1016/j.asw.2018.01.001Google Scholar
Bejar, I. I., Deane, P. D., Flor, M., & Chen, J. (2017). Evidence of the generalization and construct representation inferences for the GRE ® revised General Test sentence equivalence item type. ETS Research Report Series, (1), 125. https://doi.org/10.1002/ets2.12134CrossRefGoogle Scholar
Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the TOEFL iBT: A lexico-grammatical analysis. ETS TOEFL Research Report Series.Google Scholar
Bogorevich, V. (2018). Native and non-native raters of L2 speaking performance: Accent familiarity and cognitive processes. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 2352. https://doi.org/10.1177/0265532215576380Google Scholar
Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443469. https://doi.org/10.1177/0265532210367633CrossRefGoogle Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385405. https://doi.org/10.1177/0265532214565386CrossRefGoogle Scholar
Checa-García, I., & Guiberson, M. (2019). Test validity in morphosyntactic measures for typical and SLI incipient Spanish–English bilinguals. Language Testing, 36(1), 77100. https://doi.org/10.1177/0265532217724603CrossRefGoogle Scholar
Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 5066.Google Scholar
Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Deygers, B., van den Branden, K., & van Gorp, K. (2018). University entrance language tests: A matter of justice. Language Testing, 35(4), 449476. https://doi.org/10.1177/0265532217706196CrossRefGoogle Scholar
Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University, Kingston, ON.Google Scholar
Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317334.Google Scholar
Esfandiari, M. R., Riasati, M. J., Vaezian, H., & Rahimi, F. (2018). A quantitative analysis of TOEFL iBT using an interpretive model of test validity. Language Testing in Asia, 8(1), 7. https://doi.org/10.1186/s40468–018-0062-7Google Scholar
Frost, K., Elder, C., & Wigglesworth, G. (2011). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345369. https://doi.org/10.1177/0265532211424479CrossRefGoogle Scholar
Gaillard, S. (2014). The elicited imitation task as a method for French proficiency assessment in institutional and research settings. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Gu, L., Lockwood, J., & Powers, D. E. (2015). Evaluating the TOEFL Junior® standard test as a measure of progress for young English language learners. ETS Research Report Series. https://doi.org/10.1002/ets2.12064Google Scholar
Harsch, C., Ushioda, E., & Ladroue, C. (2017). Investigating the predictive validity of TOEFL iBT® test scores and their use in informing policy in a United Kingdom University setting. ETS Research Report Series, (1), 1–80. https://doi.org/10.1002/ets2.12167Google Scholar
He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160176. https://doi.org/10.1080/15434303.2016.1162793Google Scholar
Isbell, D. R. (2017). Assessing C2 writing ability on the Certificate of English Language Proficiency: Rater and examinee age effects. Assessing Writing, 34, 3749. https://doi.org/10.1016/j.asw.2017.08.004Google Scholar
Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar
Johnson, R. C., & Riazi, A. M. (2015). Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact. Papers in Language Testing and Assessment, 4(1), 3158.Google Scholar
Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Kadir, A. K. (2008). Framing a validity argument for test use and impact: The Malaysian public service experience. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Kelly-Riley, D., & Elliot, N. (2014). The WPA Outcomes Statement, validation, and the pursuit of localism. Assessing Writing, 21, 89103.Google Scholar
Kim, E.-Y. J. (2017). The TOEFL iBT writing: Korean students’ perceptions of the TOEFL iBT writing test. Assessing Writing, 33, 111. https://doi.org/10.1016/J.ASW.2017.02.001Google Scholar
Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2017). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1): 125144. https://doi.org/10.1177/0265532217740752Google Scholar
Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® Writing Test. ETS Research Report Series. https://doi.org/10.1002/ets2.12038Google Scholar
Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 126. https://doi.org/10.1186/s40468–016-0027-7CrossRefGoogle Scholar
Kumazawa, T., Shizuka, T., Mochizuki, M., & Mizumoto, M. (2016). Validity argument for the VELC Test® score interpretations and uses. Language Testing in Asia, 16, 1.Google Scholar
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319340. https://doi.org/10.1177/0265532215587391Google Scholar
LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451475. https://doi.org/10.1177/0265532217713951Google Scholar
Lallmamode, S. P., Mat Daud, N., & Abu Kassim, N. L. (2016). Development and initial argument-based validation of a scoring rubric used in the assessment of L2 writing electronic portfolios. Assessing Writing, 30, 4462. https://doi.org/10.1016/j.asw.2016.06.001Google Scholar
Lesnov, R. (2018). The role of content-rich visuals in the L2 academic listening assessment construct. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Li, S. (2018). Developing a test of L2 Chinese pragmatic comprehension ability. Language Testing in Asia, 8(1), 3. https://doi.org/10.1186/s40468–018-0054-7Google Scholar
Li, Z. (2015a). Using a self-assessment of English use as a tool to validate the English Placement Test. Papers in Language Testing and Assessment, 3(2), 5996.Google Scholar
Li, Z. (2015b). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, MI.Google Scholar
Link, S. M. (2015). Development and validation of an automated essay scoring engine to assess students’ development across program levels. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24(4), 489515.Google Scholar
Llosa, L., & Malone, M. E. (2018). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing. https://doi.org/10.1177/0265532218763456Google Scholar
Mendoza, A., & Knoch, U. (2018). Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation. Assessing Writing, 35, 4155. https://doi.org/10.1016/j.asw.2017.12.003CrossRefGoogle Scholar
Mozgalina, A. (2015). Applying an argument-based approach for validating language proficiency assessments in second language acquisition research: The elicited imitation test for Russian. Unpublished doctoral dissertation, Georgetown University, Washington, DC.Google Scholar
Oh, S. R. (2018). Investigating test-takers’ use of linguistic tools in second language academic writing assessment. Unpublished doctoral dissertation, Teachers College, Columbia University, New York, NY.Google Scholar
Pardo-Ballester, C. (2007). The development of a web-based Spanish listening placement exam. Unpublished doctoral dissertation, University of California, Davis, CAGoogle Scholar
Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137159.Google Scholar
Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Riazi, A. M. (2016). Comparing writing performance in TOEFL-iBT and academic assignments: An exploration of textual features. Assessing Writing, 28, 1527. https://doi.org/10.1016/j.asw.2016.02.001Google Scholar
Santos, V. (2017). A computer-adaptive test of productive and contextualized academic vocabulary breadth in English (CAT-PAV): Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Sawaki, Y., & Sinharay, S. (2013). Investigating the value of section scores for the TOEFL iBT® TEST. ETS Research Report Series, (2), i–113. https://doi.org/10.1002/j.2333-8504.2013.tb02342.xCrossRefGoogle Scholar
Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529556. https://doi.org/10.1177/0265532217716731Google Scholar
Schmidgall, J. E. (2017). The consistency of TOEIC® speaking scores across ratings and tasks. ETS Research Report Series, (1), 1–8. https://doi.org/10.1002/ets2.12178CrossRefGoogle Scholar
Schmidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583607. https://doi.org/10.1177/0265532217718600Google Scholar
Sims, J. M., & Kunnan, A. J. (2016). Developing evidence for a validity argument for an English placement exam from multi-year test performance data. Language Testing in Asia, 6(1), 1. https://doi.org/10.1186/s40468–016-0024-xGoogle Scholar
Tominaga, W. (2014). Validating the scoring inference of the Japanese OPI ratings: The use of extended turns, connective expressions, and discourse organization. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Trace, J. (2017). A validation argument for cloze test item function in second language assessment. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 322. https://doi.org/10.1177/0265532215594830Google Scholar
Wang, H. (2010). Investigating the justifiability of an additional test use: An application of assessment use argument to an English as a foreign language test. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Weigle, S. C. (2011). Validation of automated scores of TOEFL iBT® tasks against nontest indicators of writing ability. ETS Research Report Series, (2), i–63.CrossRefGoogle Scholar
Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Youn, S. J. (2013). Validating task-based assessment of L2 pragmatics in interaction using mixed methods. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199225. https://doi.org/10.1177/0265532214557113Google Scholar
Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v.1.0. ETS Research Report Series, (2).Google Scholar

References

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 134. https://doi.org/10.1207/s15434311laq0201_1Google Scholar
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessment and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar
Brennan, L. R. (2013). Commentary on “Validating the interpretations and uses of test scores.” Journal of Educational Measurement, 50(1), 7483. https://doi.org/10.1111/jedm.12001Google Scholar
Chapelle, C. A. (2021). Argument-based validation in testing and assessment. Thousand Oaks, CA: Sage Publications.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 313.Google Scholar
Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment III:9:65 (pp. 10791097). Chichester: John Wiley and Sons, Inc.Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527535.Google Scholar
Kane, M. T. (2004). Certification testing as an illustration of Argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135170. https://doi.org/10.1207/s15366359mea0203_1Google Scholar
Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: Greenwood Publishing.Google Scholar
Kane, M. T. (2012). Validating score interpretations and uses. Language Testing, 29(1), 317.Google Scholar
Kane, M. T. (2013a). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173. http://doi.org/10.1111/jedm.12000Google Scholar
Kane, M. T. (2013b). Validation as a pragmatic, scientific activity. Journal of Educational Measurement, 50(1), 115122. http://doi.org/10.1111/jedm.12007Google Scholar
Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198211. https://doi.org/10.1080/0969594X.2015.1060192Google Scholar
Messick, S. (1989). Validity. In Linn, R. (Ed.), Educational measurement (3rd ed., pp. 13103). Washington, DC: American Council on Education.Google Scholar
Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41(4), 805815. https://doi.org/10.1002/j.1545-7249.2007.tb00105.xGoogle Scholar
Siddaway, A. P., Wood, A. M., & Hedges, L. V. (2019). How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70(1), 747770. https://doi.org/10.1146/annurev-psych-010418-102803Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147170.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×