Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

doi:10.1017/9781108669849.003

Aryadoust, V. (2011). Validity arguments of the speaking and listening modules of international English language testing system: A synthesis of existing research. Asian ESP Journal, 7(2), 28–54.Google Scholar

Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle upon Tyne: Cambridge Scholars Publishing.Google Scholar

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34.Google Scholar

Bachman, L. F., & Palmer, A. (1996). Language testing in practice. Oxford: Oxford University Press.Google Scholar

Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar

Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research. Language Assessment Quarterly, 14(4), 420–431.CrossRef Google Scholar

Brooks, L., & Swain, M. (2014). Contextualizing performances: Comparing performances during TOEFL iBT^TM and real-life academic speaking activities. Language Assessment Quarterly, 11(4), 353–373.Google Scholar

Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 23–52.CrossRef Google Scholar

Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Second language acquisition and language testing interfaces (pp. 32–70). Cambridge: Cambridge University Press.Google Scholar

Chapelle, C. A. (1999). Validity in language assessment. Annual Review of Applied Linguistics, 19, 254–272.CrossRef Google Scholar

Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple… Language Testing, 29(1), 19–27.CrossRef Google Scholar

Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469.CrossRef Google Scholar

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (2008). Building a validity argument for the Test of English as a Foreign Language^TM. New York: Routledge.Google Scholar

Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment (pp. 1079–1097). Chichester: Wiley.CrossRef Google Scholar

Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 50–66.Google Scholar

Choi, Y. (2018). Graphic-prompt tasks for assessment of academic English writing ability: An argument-based approach to investigating validity. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Colby-Kelly, C., & Turner, C. (2007). AFL research in the L2 classroom and evidence of usefulness: Taking formative assessment to the next level. Canadian Modern Language Review, 64(1), 9–37.Google Scholar

Creswell, J., & Plano Clark, V. (2017). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar

Cronbach, L. J. (1971). Test validation. In Thorndike, R. L. (Ed.), Educational measurement (pp. 443–507). Washington, DC: American Council on Education.Google Scholar

Cronbach, L. J. (1988). Internal consistency of tests: Analyses old and new. Psychometrika, 53(1), 63–70.Google Scholar

Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University.Google Scholar

Doe, C. D. (2015). Student interpretations of diagnostic feedback. Language Assessment Quarterly, 12(1), 110–135.Google Scholar

Educational Testing Service (ETS). (2018). Validity evidence supporting the interpretation and use of TOEFL iBT® scores. TOEFL^® Research Insight Series, Volume 4. Princeton, NJ: Educational Testing Service.Google Scholar

Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater^® scoring. Language Testing, 27(3), 317–334.CrossRef Google Scholar

Farnsworth, T. L. (2013). An investigation into the validity of the TOEFL iBT Speaking test for international teaching assistant certification. Language Assessment Quarterly, 10(3), 274–291.Google Scholar

Frost, K., Elder, C., & Wigglesworth, G. (2012). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345–369.CrossRef Google Scholar

Fulcher, G., & Davidson, F. (2009). Test architecture, test retrofit. Language Testing, 26(1), 123–144.CrossRef Google Scholar

He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160–176.CrossRef Google Scholar

Im, G.-H., & Cheng, L. (2019). The Test of English for International Communication (TOEIC^®). Language Testing, 36(2), 315–324.Google Scholar

Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University.Google Scholar

Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.CrossRef Google Scholar

Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38, 319–342.CrossRef Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood Publishing.Google Scholar

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73.Google Scholar

Kenyon, D. (2012). Using Bachman’s assessment use argument as a tool in conceptualizing the issues surrounding linking ACTFL and CERF. In Tschirner, E. (Ed.), Aligning frameworks of reference in language testing: The ACTFL Proficiency Guidelines and the Common European Framework of Reference for Languages (pp. 23–34). Tübingen, Germany: Stauffenburg Verlag.Google Scholar

Kim, Y.-H. (2010). An argument-based validity inquiry into the Empirically-derived Descriptor-based Diagnostic (EDD) assessment in ESL academic writing. Unpublished doctoral dissertation, University of Toronto.Google Scholar

Klebanov, B. B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2019). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1), 125–144.Google Scholar

Koizumi, R., Sakai, H., Ido, T., Ota, H., Hayama, M., Sato, M., & Nemoto, A. (2011). Toward validity argument for test interpretation and use based on scores of a diagnostic grammar test for Japanese learners of English. Japanese Journal for Research on Testing (『日本テスト学会誌』), 7(1), 99–119.Google Scholar

LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475.Google Scholar

Lee, J. (2016). Transfer from ESL academic writing to first year composition and other disciplinary courses: An assessment perspective. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Li, Z. (2015). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 32–42.Google Scholar

Llosa, L., & Malone, M. E. (2019). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing, 36(2), 235–263.Google Scholar

McNamara, T., & Roever, C. (2006). Language testing: The social dimension. Oxford: Blackwell Publishing.Google Scholar

Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: Macmillan Publishing Co.Google Scholar

Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence-centered design for educational testing. Educational Measurement: Issues and Practice, Winter, 6–20.Google Scholar

Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–62.Google Scholar

Norris, J. M. (2008). Validity evaluation in language assessment. New York: Peter Lang.Google Scholar

Pan, M., & Qian, D. D. (2017). Embedding corpora into the content validation of the grammar test of the National Matriculation English Test (NMET) in China. Language Assessment Quarterly, 14(2), 120–139.CrossRef Google Scholar

Papageorgiou, S., & Tannenbaum, R. J. (2016). Situating standard setting within argument-based validity. Language Assessment Quarterly, 13(2), 109–123.CrossRef Google Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137–159.Google Scholar

Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Plakans, L., & Burke, M. (2013). The decision-making process in language program placement: Test and nontest factors interacting in context. Language Assessment Quarterly, 10(2), 115–134.Google Scholar

Roever, C. (2011). Testing of second language pragmatics: Past and future. Language Testing, 28(4), 463–481.Google Scholar

Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT^® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529–556.Google Scholar

Schimidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583–607.CrossRef Google Scholar

Schmidgall, J. E, & Xi, X. (2020). Validation of language assessments. In Chapelle, C. A. (Ed.), Concise encyclopedia of applied linguistics (pp. 1123–1135). Oxford: Wiley-Blackwell.Google Scholar

So, Y. (2014). Are teacher perspectives useful? Incorporating EFL teacher feedback in the development of a large-scale international English test. Language Assessment Quarterly, 11(3), 283–303.Google Scholar

Suzuki, Y. (2015). Self-assessment of Japanese as a second language: The role of experiences in the naturalistic acquisition. Language Testing, 32(1), 63–81.CrossRef Google Scholar

Toulmin, S. E. (2003). The uses of argument. Cambridge: Cambridge University Press.CrossRef Google Scholar

Vongpumivitch, V. (2010). The General English Proficiency Test. In Cheng, L. & Curtis, A. (Eds.), English language assessment and the Chinese learner (pp. 158–172). New York: Routledge.Google Scholar

Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Wang, H., Choi, I., Schmidgall, J., & Bachman, L. F. (2012). Review of Pearson Test of English Academic. Language Testing, 29(4), 603–619.Google Scholar

Weigle, S. C., Yang, W., & Montee, M. (2013). Exploring reading processes in an academic reading test using short-answer questions. Language Assessment Quarterly, 10(1), 28–48.Google Scholar

Weir, C. J. (2005). Language testing and validation: An evidence-based approach. Basingstoke: Palgrave Macmillan.CrossRef Google Scholar

Xi, X. (2008). Methods of test validation. In Shohamy, E. & Hornberger, N. H. (Eds.), Encyclopedia of language and education, 2nd edition, Volume 7: Language testing and assessment (pp. 177–196). New York: Springer.Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.Google Scholar

Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University.Google Scholar

Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199–225.CrossRef Google Scholar

Barkaoui, K. (2014). Examining the impact of L2 proficiency and keyboarding skills on scores on TOEFL-iBT writing tasks. Language Testing, 31(2), 241–259. https://doi.org/10.1177/0265532213509810 Google Scholar

Barkaoui, K. (2015). Test takers’ writing activities during the TOEFL iBT® writing tasks: A stimulated recall study. ETS Research Report Series, (1), 1–42. https://doi.org/10.1002/ets2.12050 CrossRef Google Scholar

Barkaoui, K., & Knouzi, I. (2018). The effects of writing mode and computer ability on L2 test-takers’ essay characteristics and scores. Assessing Writing, 46, 19–31. https://doi.org/10.1016/j.asw.2018.02.005 Google Scholar

Becker, A. P. (2011). Building evidence for the evaluation of English learners’ writing scores. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Becker, A. (2018). Not to scale? An argument-based inquiry into the validity of an L2 writing rating scale. Assessing Writing, 37, 1–12. https://doi.org/10.1016/j.asw.2018.01.001 Google Scholar

Bejar, I. I., Deane, P. D., Flor, M., & Chen, J. (2017). Evidence of the generalization and construct representation inferences for the GRE ® revised General Test sentence equivalence item type. ETS Research Report Series, (1), 1–25. https://doi.org/10.1002/ets2.12134 Google Scholar

Biber, D., & Gray, B. (2013). Discourse characteristics of writing and speaking task types on the TOEFL iBT: A lexico-grammatical analysis. ETS TOEFL Research Report Series.Google Scholar

Bogorevich, V. (2018). Native and non-native raters of L2 speaking performance: Accent familiarity and cognitive processes. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Carroll, P. E., & Bailey, A. L. (2016). Do decision rules matter? A descriptive study of English language proficiency assessment classifications for English-language learners and native English speakers in fifth grade. Language Testing, 33(1), 23–52. https://doi.org/10.1177/0265532215576380 CrossRef Google Scholar

Chapelle, C. A., Chung, Y.-R., Hegelheimer, V., Pendar, N., & Xu, J. (2010). Towards a computer-delivered test of productive grammatical ability. Language Testing, 27(4), 443–469. https://doi.org/10.1177/0265532210367633 Google Scholar

Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, 32(3), 385–405. https://doi.org/10.1177/0265532214565386 Google Scholar

Checa-García, I., & Guiberson, M. (2019). Test validity in morphosyntactic measures for typical and SLI incipient Spanish–English bilinguals. Language Testing, 36(1), 77–100. https://doi.org/10.1177/0265532217724603 Google Scholar

Cheng, L., & Sun, Y. (2015). Interpreting the impact of the Ontario Secondary School Literacy Test on second language students within an argument-based validation framework. Language Assessment Quarterly, 12(1), 50–66.CrossRef Google Scholar

Chung, Y.-R. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Deygers, B., van den Branden, K., & van Gorp, K. (2018). University entrance language tests: A matter of justice. Language Testing, 35(4), 449–476. https://doi.org/10.1177/0265532217706196 Google Scholar

Doe, C. D. (2013). Validating the Canadian academic English language assessment for diagnostic purposes from three perspectives: Scoring, teaching, and learning. Unpublished doctoral dissertation, Queen’s University, Kingston, ON.Google Scholar

Enright, M. K., & Quinlan, T. (2010). Complementing human judgment of essays written by English language learners with e-rater® scoring. Language Testing, 27(3), 317–334.Google Scholar

Esfandiari, M. R., Riasati, M. J., Vaezian, H., & Rahimi, F. (2018). A quantitative analysis of TOEFL iBT using an interpretive model of test validity. Language Testing in Asia, 8(1), 7. https://doi.org/10.1186/s40468–018-0062-7 Google Scholar

Frost, K., Elder, C., & Wigglesworth, G. (2011). Investigating the validity of an integrated listening-speaking task: A discourse-based analysis of test takers’ oral performances. Language Testing, 29(3), 345–369. https://doi.org/10.1177/0265532211424479 Google Scholar

Gaillard, S. (2014). The elicited imitation task as a method for French proficiency assessment in institutional and research settings. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar

Gu, L., Lockwood, J., & Powers, D. E. (2015). Evaluating the TOEFL Junior® standard test as a measure of progress for young English language learners. ETS Research Report Series. https://doi.org/10.1002/ets2.12064 Google Scholar

Harsch, C., Ushioda, E., & Ladroue, C. (2017). Investigating the predictive validity of TOEFL iBT® test scores and their use in informing policy in a United Kingdom University setting. ETS Research Report Series, (1), 1–80. https://doi.org/10.1002/ets2.12167 CrossRef Google Scholar

He, L., & Min, S. (2017). Development and validation of a computer adaptive EFL test. Language Assessment Quarterly, 14(2), 160–176. https://doi.org/10.1080/15434303.2016.1162793 Google Scholar

Isbell, D. R. (2017). Assessing C2 writing ability on the Certificate of English Language Proficiency: Rater and examinee age effects. Assessing Writing, 34, 37–49. https://doi.org/10.1016/j.asw.2017.08.004 CrossRef Google Scholar

Jia, Y. (2013). Justifying the use of a second language oral test as an exit test in Hong Kong: An application of assessment use argument framework. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Johnson, R. C. (2011). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Unpublished doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar

Johnson, R. C., & Riazi, A. M. (2015). Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact. Papers in Language Testing and Assessment, 4(1), 31–58.Google Scholar

Jun, H. S. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Kadir, A. K. (2008). Framing a validity argument for test use and impact: The Malaysian public service experience. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.Google Scholar

Kelly-Riley, D., & Elliot, N. (2014). The WPA Outcomes Statement, validation, and the pursuit of localism. Assessing Writing, 21, 89–103.CrossRef Google Scholar

Kim, E.-Y. J. (2017). The TOEFL iBT writing: Korean students’ perceptions of the TOEFL iBT writing test. Assessing Writing, 33, 1–11. https://doi.org/10.1016/J.ASW.2017.02.001 Google Scholar

Klebanov, B., Ramineni, C., Kaufer, D., Yeoh, P., & Ishizaki, S. (2017). Advancing the validity argument for standardized writing tests using quantitative rhetorical analysis. Language Testing, 36(1): 125–144. https://doi.org/10.1177/0265532217740752 Google Scholar

Knoch, U., Macqueen, S., & O’Hagan, S. (2014). An investigation of the effect of task type on the discourse produced by students at various score levels in the TOEFL iBT® Writing Test. ETS Research Report Series. https://doi.org/10.1002/ets2.12038 Google Scholar

Koizumi, R., In’nami, Y., Asano, K., & Agawa, T. (2016). Validity evidence of Criterion® for assessing L2 writing proficiency in a Japanese university context. Language Testing in Asia, 6(5), 1–26. https://doi.org/10.1186/s40468–016-0027-7 Google Scholar

Kumazawa, T., Shizuka, T., Mochizuki, M., & Mizumoto, M. (2016). Validity argument for the VELC Test® score interpretations and uses. Language Testing in Asia, 16, 1.Google Scholar

Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340. https://doi.org/10.1177/0265532215587391 Google Scholar

LaFlair, G. T., & Staples, S. (2017). Using corpus linguistics to examine the extrapolation inference in the validity argument for a high-stakes speaking assessment. Language Testing, 34(4), 451–475. https://doi.org/10.1177/0265532217713951 Google Scholar

Lallmamode, S. P., Mat Daud, N., & Abu Kassim, N. L. (2016). Development and initial argument-based validation of a scoring rubric used in the assessment of L2 writing electronic portfolios. Assessing Writing, 30, 44–62. https://doi.org/10.1016/j.asw.2016.06.001 Google Scholar

Lesnov, R. (2018). The role of content-rich visuals in the L2 academic listening assessment construct. Unpublished doctoral dissertation, Northern Arizona University, Flagstaff, AZ.Google Scholar

Li, S. (2018). Developing a test of L2 Chinese pragmatic comprehension ability. Language Testing in Asia, 8(1), 3. https://doi.org/10.1186/s40468–018-0054-7 Google Scholar

Li, Z. (2015a). Using a self-assessment of English use as a tool to validate the English Placement Test. Papers in Language Testing and Assessment, 3(2), 59–96.Google Scholar

Li, Z. (2015b). An argument-based validation study of the English Placement Test (EPT): Focusing on the inferences of extrapolation and ramification. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan, Ann Arbor, MI.Google Scholar

Link, S. M. (2015). Development and validation of an automated essay scoring engine to assess students’ development across program levels. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Llosa, L. (2005). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Llosa, L. (2007). Validating a standards-based classroom assessment of English proficiency: A multitrait-multimethod approach. Language Testing, 24(4), 489–515.Google Scholar

Llosa, L., & Malone, M. E. (2018). Comparability of students’ writing performance on TOEFL iBT and in required university writing courses. Language Testing. https://doi.org/10.1177/0265532218763456 Google Scholar

Mendoza, A., & Knoch, U. (2018). Examining the validity of an analytic rating scale for a Spanish test for academic purposes using the argument-based approach to validation. Assessing Writing, 35, 41–55. https://doi.org/10.1016/j.asw.2017.12.003 Google Scholar

Mozgalina, A. (2015). Applying an argument-based approach for validating language proficiency assessments in second language acquisition research: The elicited imitation test for Russian. Unpublished doctoral dissertation, Georgetown University, Washington, DC.Google Scholar

Oh, S. R. (2018). Investigating test-takers’ use of linguistic tools in second language academic writing assessment. Unpublished doctoral dissertation, Teachers College, Columbia University, New York, NY.Google Scholar

Pardo-Ballester, C. (2007). The development of a web-based Spanish listening placement exam. Unpublished doctoral dissertation, University of California, Davis, CAGoogle Scholar

Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish listening exam: Test usefulness evaluation. Language Assessment Quarterly, 7(2), 137–159.CrossRef Google Scholar

Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Riazi, A. M. (2016). Comparing writing performance in TOEFL-iBT and academic assignments: An exploration of textual features. Assessing Writing, 28, 15–27. https://doi.org/10.1016/j.asw.2016.02.001 Google Scholar

Santos, V. (2017). A computer-adaptive test of productive and contextualized academic vocabulary breadth in English (CAT-PAV): Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Sawaki, Y., & Sinharay, S. (2013). Investigating the value of section scores for the TOEFL iBT® TEST. ETS Research Report Series, (2), i–113. https://doi.org/10.1002/j.2333-8504.2013.tb02342.x Google Scholar

Sawaki, Y., & Sinharay, S. (2018). Do the TOEFL iBT® section scores provide value-added information to stakeholders? Language Testing, 35(4), 529–556. https://doi.org/10.1177/0265532217716731 Google Scholar

Schmidgall, J. E. (2017). The consistency of TOEIC® speaking scores across ratings and tasks. ETS Research Report Series, (1), 1–8. https://doi.org/10.1002/ets2.12178 Google Scholar

Schmidgall, J. E., Getman, E. P., & Zu, J. (2018). Screener tests need validation too: Weighing an argument for test use against practical concerns. Language Testing, 35(4), 583–607. https://doi.org/10.1177/0265532217718600 CrossRef Google Scholar

Sims, J. M., & Kunnan, A. J. (2016). Developing evidence for a validity argument for an English placement exam from multi-year test performance data. Language Testing in Asia, 6(1), 1. https://doi.org/10.1186/s40468–016-0024-x Google Scholar

Tominaga, W. (2014). Validating the scoring inference of the Japanese OPI ratings: The use of extended turns, connective expressions, and discourse organization. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Trace, J. (2017). A validation argument for cloze test item function in second language assessment. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830 Google Scholar

Wang, H. (2010). Investigating the justifiability of an additional test use: An application of assessment use argument to an English as a foreign language test. Unpublished doctoral dissertation, University of California, Los Angeles.Google Scholar

Weigle, S. C. (2011). Validation of automated scores of TOEFL iBT® tasks against nontest indicators of writing ability. ETS Research Report Series, (2), i–63.CrossRef Google Scholar

Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar

Youn, S. J. (2013). Validating task-based assessment of L2 pragmatics in interaction using mixed methods. Unpublished doctoral dissertation, University of Hawai’i at Manoa.Google Scholar

Youn, S. J. (2015). Validity argument for assessing L2 pragmatics in interaction using mixed methods. Language Testing, 32(2), 199–225. https://doi.org/10.1177/0265532214557113 Google Scholar

Xi, X., Higgins, D., Zechner, K., & Williamson, D. M. (2008). Automated scoring of spontaneous speech using SpeechRater v.1.0. ETS Research Report Series, (2).Google Scholar

Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 2(1), 1–34. https://doi.org/10.1207/s15434311laq0201_1 CrossRef Google Scholar

Bachman, L. F., & Palmer, A. S. (2010). Language assessment in practice: Developing language assessment and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar

Brennan, L. R. (2013). Commentary on “Validating the interpretations and uses of test scores.” Journal of Educational Measurement, 50(1), 74–83. https://doi.org/10.1111/jedm.12001 Google Scholar

Chapelle, C. A. (2021). Argument-based validation in testing and assessment. Thousand Oaks, CA: Sage Publications.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar

Chapelle, C. A., Enright, M. K., & Jamieson, J. (2010). Does an argument-based approach to validity make a difference? Educational Measurement: Issues and Practice, 29(1), 3–13.CrossRef Google Scholar

Chapelle, C. A., & Voss, E. (2013). Evaluation of language tests through validation research. In Kunnan, A. J. (Ed.), The companion to language assessment III:9:65 (pp. 1079–1097). Chichester: John Wiley and Sons, Inc.CrossRef Google Scholar

Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.CrossRef Google Scholar

Kane, M. T. (2004). Certification testing as an illustration of Argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135–170. https://doi.org/10.1207/s15366359mea0203_1 Google Scholar

Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 17–64). Westport, CT: Greenwood Publishing.Google Scholar

Kane, M. T. (2012). Validating score interpretations and uses. Language Testing, 29(1), 3–17.Google Scholar

Kane, M. T. (2013a). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1–73. http://doi.org/10.1111/jedm.12000 Google Scholar

Kane, M. T. (2013b). Validation as a pragmatic, scientific activity. Journal of Educational Measurement, 50(1), 115–122. http://doi.org/10.1111/jedm.12007 Google Scholar

Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–211. https://doi.org/10.1080/0969594X.2015.1060192 Google Scholar

Messick, S. (1989). Validity. In Linn, R. (Ed.), Educational measurement (3rd ed., pp. 13–103). Washington, DC: American Council on Education.Google Scholar

Norris, J. M., & Ortega, L. (2007). The future of research synthesis in applied linguistics: Beyond art or science. TESOL Quarterly, 41(4), 805–815. https://doi.org/10.1002/j.1545-7249.2007.tb00105.x Google Scholar

Siddaway, A. P., Wood, A. M., & Hedges, L. V. (2019). How to do a systematic review: A best practice guide for conducting and reporting narrative reviews, meta-analyses, and meta-syntheses. Annual Review of Psychology, 70(1), 747–770. https://doi.org/10.1146/annurev-psych-010418-102803 Google Scholar

Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27(2), 147–170.Google Scholar

Book contents

Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Summary

Access options

References

References

References

Book contents

Part I - Basic Concepts and Uses of Validity Argument in Language Testing and Assessment

Summary

Access options

References

References

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive