Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-skm99 Total loading time: 0 Render date: 2024-04-25T11:21:29.863Z Has data issue: false hasContentIssue false

Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Published online by Cambridge University Press:  14 January 2021

Carol A. Chapelle
Affiliation:
Iowa State University
Erik Voss
Affiliation:
Teachers College, Columbia University
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 17 - 70
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

References

Aryadoust, V. (2011). Validity arguments of the speaking and listening modules of international English language testing system: A synthesis of existing research. Asian ESP Journal, 7(2), 2854.Google Scholar
Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 134.Google Scholar
Bachman, L. F., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press.Google Scholar
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar
Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research. Language Assessment Quarterly, 14(4), 420431.CrossRefGoogle Scholar
Brooks, L., & Swain, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBTTM and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353373.Google Scholar
Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 2352.CrossRefGoogle Scholar
Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Second language acquisition and language testing interfaces (pp. 3270). Cambridge: Cambridge University Press.Google Scholar
Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254272.CrossRefGoogle Scholar
Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple… Language Testing, 29(1), 1927.CrossRefGoogle Scholar
Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443469.CrossRefGoogle Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385405.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign LanguageTM. New York: Routledge.Google Scholar
Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment (pp. 10791097). Chichester: Wiley.CrossRefGoogle Scholar
Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 5066.Google Scholar
Choi, Y. (2018). Graphic-prompt tasks for assessment of academic English writing ability: An argument-based approach to investigating validity. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Colby-Kelly, C., & Turner, C. (2007). AFL research in the L2 classroom and evidence of usefulness: Taking formative assessment to the next level. Canadian Modern Language Review, 64(1), 937.Google Scholar
Creswell, J., & Plano Clark, V. (2017). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar
Cronbach, L. J. (1971). Test validation. In Thorndike, R. L. (Ed.), Educational measurement (pp. 443507). Washington, DC: American Council on Education.Google Scholar
Cronbach, L. J. (1988). Internal consistency of tests: Analyses old and new. Psychometrika, 53(1), 6370.Google Scholar
Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University.Google Scholar
Doe, C. D. (2015). Student interpretations of diagnostic feedback. Language Assessment Quarterly, 12(1), 110135.Google Scholar
Educational Testing Service (ETS). (2018). Validity evidence supporting the interpretation and use of TOEFL iBT® scores. TOEFL® Research Insight Series, Volume 4. Princeton, NJ: Educational Testing Service.Google Scholar
Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317334.CrossRefGoogle Scholar
Farnsworth, T. L. (2013). An investigation into the validity of the TOEFL iBT Speaking test for international teaching assistant certification. Language Assessment Quarterly, 10(3), 274291.Google Scholar
Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345369.CrossRefGoogle Scholar
Fulcher, G., & Davidson, F. (2009). Test architecture, test retrofit. Language Testing, 26(1), 123144.CrossRefGoogle Scholar
He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160176.CrossRefGoogle Scholar
Im, G.-H., & Cheng, L. (2019). The Test of English for International Communication (TOEIC®). Language Testing, 36(2), 315324.Google Scholar
Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University.Google Scholar
Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527535.CrossRefGoogle Scholar
Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38, 319342.CrossRefGoogle Scholar
Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: Greenwood Publishing.Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173.Google Scholar
Kenyon, D. (2012). Using Bachman’s assessment use argument as a tool in conceptualizing the issues surrounding linking ACTFL and CERF. In Tschirner, E. (Ed.), Aligning frameworks of reference in language testing: The ACTFL Proficiency Guidelines and the Common European Framework of Reference for Languages (pp. 2334). Tübingen, Germany: Stauffenburg Verlag.Google Scholar
Kim, Y.-H. (2010). An argument-based validity inquiry into the Empirically-derived Descriptor-based Diagnostic (EDD) assessment in ESL academic writing. Unpublished doctoral dissertation, University of Toronto.Google Scholar
Klebanov, B. B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2019). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1), 125144.Google Scholar
Koizumi, R., Sakai, H., Ido, T., Ota, H., Hayama, M., Sato, M., & Nemoto, A. (2011). Toward validity argument for test interpretation and use based on scores of a diagnostic grammar test for Japanese learners of English. Japanese Journal for Research on Testing (『日本テスト学会誌』), 7(1), 99119.Google Scholar
LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451475.Google Scholar
Lee, J. (2016). Transfer from ESL academic writing to first year composition and other disciplinary courses: An assessment perspective. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Li, Z. (2015). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 3242.Google Scholar
Llosa, L., & Malone, M. E. (2019). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing, 36(2), 235263.Google Scholar
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Oxford: Blackwell Publishing.Google Scholar
Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 13103). New York: Macmillan Publishing Co.Google Scholar
Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, Winter, 6–20.Google Scholar
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 362.Google Scholar
Norris, J. M. (2008). Validity evaluation in language assessment. New York: Peter Lang.Google Scholar
Pan, M., & Qian, D. D. (2017). Embedding corpora into the content validation of the grammar test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly, 14(2), 120139.CrossRefGoogle Scholar
Papageorgiou, S., & Tannenbaum, R. J. (2016). Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109123.CrossRefGoogle Scholar
Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137159.Google Scholar
Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Plakans, L., & Burke, M. (2013). The decision-making process in language program placement: Test and nontest factors interacting in context. Language Assessment Quarterly, 10(2), 115134.Google Scholar
Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28(4), 463481.Google Scholar
Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529556.Google Scholar
Schimidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583607.CrossRefGoogle Scholar
Schmidgall, J. E, & Xi, X. (2020). Validation of language assessments. In Chapelle, C. A. (Ed.), Concise encyclopedia of applied linguistics (pp. 11231135). Oxford: Wiley-Blackwell.Google Scholar
So, Y. (2014). Are teacher perspectives useful? Incorporating EFL teacher feedback in the development of a large-scale international English test. Language Assessment Quarterly, 11(3), 283303.Google Scholar
Suzuki, Y. (2015). Self-assessment of Japanese as a second language: The role of experiences in the naturalistic acquisition. Language Testing, 32(1), 6381.CrossRefGoogle Scholar
Toulmin, S. E. (2003). The uses of argument. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Vongpumivitch, V. (2010). The General English Proficiency Test. In Cheng, L. & Curtis, A. (Eds.), English language assessment and the Chinese learner (pp. 158172). New York: Routledge.Google Scholar
Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Wang, H., Choi, I., Schmidgall, J., & Bachman, L. F. (2012). Review of Pearson Test of English Academic. Language Testing, 29(4), 603619.Google Scholar
Weigle, S. C., Yang, W., & Montee, M. (2013). Exploring reading processes in an academic reading test using short-answer questions. Language Assessment Quarterly, 10(1), 2848.Google Scholar
Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.CrossRefGoogle Scholar
Xi, X. (2008). Methods of test validation. In Shohamy, E. & Hornberger, N. H. (Eds.), Encyclopedia of language and education, 2nd edition, Volume 7: Language testing and assessment (pp. 177196). New York: Springer.Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147170.Google Scholar
Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar
Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199225.CrossRefGoogle Scholar
Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241259. https://doi.org/10.1177/0265532213509810Google Scholar
Barkaoui, K. (2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. ETS Research Report Series, (1), 142. https://doi.org/10.1002/ets2.12050CrossRefGoogle Scholar
Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores. Assessing Writing, 46, 1931. https://doi.org/10.1016/j.asw.2018.02.005Google Scholar
Becker, A. P. (2011). Building evidence for the evaluation of English learners’ writing scores. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Becker, A. (2018). Not to scale? An argument-based inquiry into the validity of an L2 writing rating scale. Assessing Writing, 37, 112. https://doi.org/10.1016/j.asw.2018.01.001Google Scholar
Bejar, I. I., Deane, P. D., Flor, M., & Chen, J. (2017). Evidence of the generalization and construct representation inferences for the GRE ® revised General Test sentence equivalence item type. ETS Research Report Series, (1), 125. https://doi.org/10.1002/ets2.12134Google Scholar
Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the TOEFL iBT: A lexico-grammatical analysis. ETS TOEFL Research Report Series.Google Scholar
Bogorevich, V. (2018). Native and non-native raters of L2 speaking performance: Accent familiarity and cognitive processes. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 2352. https://doi.org/10.1177/0265532215576380CrossRefGoogle Scholar
Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443469. https://doi.org/10.1177/0265532210367633Google Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385405. https://doi.org/10.1177/0265532214565386Google Scholar
Checa-García, I., & Guiberson, M. (2019). Test validity in morphosyntactic measures for typical and SLI incipient Spanish–English bilinguals. Language Testing, 36(1), 77100. https://doi.org/10.1177/0265532217724603Google Scholar
Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 5066.CrossRefGoogle Scholar
Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Deygers, B., van den Branden, K., & van Gorp, K. (2018). University entrance language tests: A matter of justice. Language Testing, 35(4), 449476. https://doi.org/10.1177/0265532217706196Google Scholar
Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University, Kingston, ON.Google Scholar
Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317334.Google Scholar
Esfandiari, M. R., Riasati, M. J., Vaezian, H., & Rahimi, F. (2018). A quantitative analysis of TOEFL iBT using an interpretive model of test validity. Language Testing in Asia, 8(1), 7. https://doi.org/10.1186/s40468–018-0062-7Google Scholar
Frost, K., Elder, C., & Wigglesworth, G. (2011). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345369. https://doi.org/10.1177/0265532211424479Google Scholar
Gaillard, S. (2014). The elicited imitation task as a method for French proficiency assessment in institutional and research settings. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Gu, L., Lockwood, J., & Powers, D. E. (2015). Evaluating the TOEFL Junior® standard test as a measure of progress for young English language learners. ETS Research Report Series. https://doi.org/10.1002/ets2.12064Google Scholar
Harsch, C., Ushioda, E., & Ladroue, C. (2017). Investigating the predictive validity of TOEFL iBT® test scores and their use in informing policy in a United Kingdom University setting. ETS Research Report Series, (1), 1–80. https://doi.org/10.1002/ets2.12167CrossRefGoogle Scholar
He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160176. https://doi.org/10.1080/15434303.2016.1162793Google Scholar
Isbell, D. R. (2017). Assessing C2 writing ability on the Certificate of English Language Proficiency: Rater and examinee age effects. Assessing Writing, 34, 3749. https://doi.org/10.1016/j.asw.2017.08.004CrossRefGoogle Scholar
Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar
Johnson, R. C., & Riazi, A. M. (2015). Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact. Papers in Language Testing and Assessment, 4(1), 3158.Google Scholar
Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Kadir, A. K. (2008). Framing a validity argument for test use and impact: The Malaysian public service experience. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar
Kelly-Riley, D., & Elliot, N. (2014). The WPA Outcomes Statement, validation, and the pursuit of localism. Assessing Writing, 21, 89103.CrossRefGoogle Scholar
Kim, E.-Y. J. (2017). The TOEFL iBT writing: Korean students’ perceptions of the TOEFL iBT writing test. Assessing Writing, 33, 111. https://doi.org/10.1016/J.ASW.2017.02.001Google Scholar
Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2017). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1): 125144. https://doi.org/10.1177/0265532217740752Google Scholar
Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® Writing Test. ETS Research Report Series. https://doi.org/10.1002/ets2.12038Google Scholar
Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 126. https://doi.org/10.1186/s40468–016-0027-7Google Scholar
Kumazawa, T., Shizuka, T., Mochizuki, M., & Mizumoto, M. (2016). Validity argument for the VELC Test® score interpretations and uses. Language Testing in Asia, 16, 1.Google Scholar
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319340. https://doi.org/10.1177/0265532215587391Google Scholar
LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451475. https://doi.org/10.1177/0265532217713951Google Scholar
Lallmamode, S. P., Mat Daud, N., & Abu Kassim, N. L. (2016). Development and initial argument-based validation of a scoring rubric used in the assessment of L2 writing electronic portfolios. Assessing Writing, 30, 4462. https://doi.org/10.1016/j.asw.2016.06.001Google Scholar
Lesnov, R. (2018). The role of content-rich visuals in the L2 academic listening assessment construct. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar
Li, S. (2018). Developing a test of L2 Chinese pragmatic comprehension ability. Language Testing in Asia, 8(1), 3. https://doi.org/10.1186/s40468–018-0054-7Google Scholar
Li, Z. (2015a). Using a self-assessment of English use as a tool to validate the English Placement Test. Papers in Language Testing and Assessment, 3(2), 5996.Google Scholar
Li, Z. (2015b). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, MI.Google Scholar
Link, S. M. (2015). Development and validation of an automated essay scoring engine to assess students’ development across program levels. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24(4), 489515.Google Scholar
Llosa, L., & Malone, M. E. (2018). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing. https://doi.org/10.1177/0265532218763456Google Scholar
Mendoza, A., & Knoch, U. (2018). Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation. Assessing Writing, 35, 4155. https://doi.org/10.1016/j.asw.2017.12.003Google Scholar
Mozgalina, A. (2015). Applying an argument-based approach for validating language proficiency assessments in second language acquisition research: The elicited imitation test for Russian. Unpublished doctoral dissertation, Georgetown University, Washington, DC.Google Scholar
Oh, S. R. (2018). Investigating test-takers’ use of linguistic tools in second language academic writing assessment. Unpublished doctoral dissertation, Teachers College, Columbia University, New York, NY.Google Scholar
Pardo-Ballester, C. (2007). The development of a web-based Spanish listening placement exam. Unpublished doctoral dissertation, University of California, Davis, CAGoogle Scholar
Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137159.CrossRefGoogle Scholar
Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Riazi, A. M. (2016). Comparing writing performance in TOEFL-iBT and academic assignments: An exploration of textual features. Assessing Writing, 28, 1527. https://doi.org/10.1016/j.asw.2016.02.001Google Scholar
Santos, V. (2017). A computer-adaptive test of productive and contextualized academic vocabulary breadth in English (CAT-PAV): Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Sawaki, Y., & Sinharay, S. (2013). Investigating the value of section scores for the TOEFL iBT® TEST. ETS Research Report Series, (2), i–113. https://doi.org/10.1002/j.2333-8504.2013.tb02342.xGoogle Scholar
Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529556. https://doi.org/10.1177/0265532217716731Google Scholar
Schmidgall, J. E. (2017). The consistency of TOEIC® speaking scores across ratings and tasks. ETS Research Report Series, (1), 1–8. https://doi.org/10.1002/ets2.12178Google Scholar
Schmidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583607. https://doi.org/10.1177/0265532217718600CrossRefGoogle Scholar
Sims, J. M., & Kunnan, A. J. (2016). Developing evidence for a validity argument for an English placement exam from multi-year test performance data. Language Testing in Asia, 6(1), 1. https://doi.org/10.1186/s40468–016-0024-xGoogle Scholar
Tominaga, W. (2014). Validating the scoring inference of the Japanese OPI ratings: The use of extended turns, connective expressions, and discourse organization. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Trace, J. (2017). A validation argument for cloze test item function in second language assessment. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 322. https://doi.org/10.1177/0265532215594830Google Scholar
Wang, H. (2010). Investigating the justifiability of an additional test use: An application of assessment use argument to an English as a foreign language test. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar
Weigle, S. C. (2011). Validation of automated scores of TOEFL iBT® tasks against nontest indicators of writing ability. ETS Research Report Series, (2), i–63.CrossRefGoogle Scholar
Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Youn, S. J. (2013). Validating task-based assessment of L2 pragmatics in interaction using mixed methods. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar
Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199225. https://doi.org/10.1177/0265532214557113Google Scholar
Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v.1.0. ETS Research Report Series, (2).Google Scholar

References

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 134. https://doi.org/10.1207/s15434311laq0201_1CrossRefGoogle Scholar
Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessment and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar
Brennan, L. R. (2013). Commentary on “Validating the interpretations and uses of test scores.” Journal of Educational Measurement, 50(1), 7483. https://doi.org/10.1111/jedm.12001Google Scholar
Chapelle, C. A. (2021). Argument-based validation in testing and assessment. Thousand Oaks, CA: Sage Publications.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 313.CrossRefGoogle Scholar
Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment III:9:65 (pp. 10791097). Chichester: John Wiley and Sons, Inc.CrossRefGoogle Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527535.CrossRefGoogle Scholar
Kane, M. T. (2004). Certification testing as an illustration of Argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135170. https://doi.org/10.1207/s15366359mea0203_1Google Scholar
Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: Greenwood Publishing.Google Scholar
Kane, M. T. (2012). Validating score interpretations and uses. Language Testing, 29(1), 317.Google Scholar
Kane, M. T. (2013a). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173. http://doi.org/10.1111/jedm.12000Google Scholar
Kane, M. T. (2013b). Validation as a pragmatic, scientific activity. Journal of Educational Measurement, 50(1), 115122. http://doi.org/10.1111/jedm.12007Google Scholar
Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198211. https://doi.org/10.1080/0969594X.2015.1060192Google Scholar
Messick, S. (1989). Validity. In Linn, R. (Ed.), Educational measurement (3rd ed., pp. 13103). Washington, DC: American Council on Education.Google Scholar
Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41(4), 805815. https://doi.org/10.1002/j.1545-7249.2007.tb00105.xGoogle Scholar
Siddaway, A. P., Wood, A. M., & Hedges, L. V. (2019). How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70(1), 747770. https://doi.org/10.1146/annurev-psych-010418-102803Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147170.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×