Skip to main content Accessibility help
×
Home
Hostname: page-component-6c8bd87754-fppf4 Total loading time: 0.82 Render date: 2022-01-20T08:58:04.659Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

Part II - Investigating Score Interpretations

Published online by Cambridge University Press:  14 January 2021

Carol A. Chapelle
Affiliation:
Iowa State University
Erik Voss
Affiliation:
Teachers College, Columbia University
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Validity Argument in Language Testing
Case Studies of Validation Research
, pp. 71 - 232
Publisher: Cambridge University Press
Print publication year: 2021

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

AAS. (2012). Air traffic control. Army Aviation School.Google Scholar
Alderson, J. C. (2006). Final report on a survey of aviation English tests: Lancaster University and the European Organisation for the Safety of Air Navigation (Eurocontrol).Google Scholar
Alderson, J. C. (2009). Air safety, language assessment policy, and policy implementation: The case of Aviation English. Annual Review of Applied Linguistics, 29, 168187. https://doi.org/10.1017/s0267190509090138CrossRefGoogle Scholar
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. Cambridge: Cambridge University Press.Google Scholar
Bachman, L. F., & Palmer, A. S. (1996). Language testing in practice: Designing and developing useful language tests (Vol. 1). Oxford: Oxford University Press.Google Scholar
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice: Developing language assessments and justifying their use in the real world. Oxford: Oxford University Press.Google Scholar
Bowen, G. (2009). Document analysis as a qualitative research method. Qualitative Research Journal, 9, 2740.CrossRefGoogle Scholar
Brown, J. D. (2004). Performance assessment: Existing literature and directions for research. Second Language Studies, 22(2), 91139.Google Scholar
CAA. (2014). Manual on the English Language Proficiency Assessment (ICAO language proficiency requirements). Republic of Moldova: Civil Aviation Authority.Google Scholar
Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Interfaces between second language acquisition and language resting research (pp. 3270). New York: Cambridge University Press.Google Scholar
Chapelle, C. A. (2008). The TOEFL validity argument. In Chapelle, C. A., Enright, M. E., & Jamieson, J. (Eds.), Building a validity argument for the Test of English as a Foreign Language (pp. 319350). New York: Routledge.Google Scholar
Chapelle, C. A. (2012). Validity argument for language assessment: The framework is simple. Language Testing, 29(1), 1927.CrossRefGoogle Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the test of English as a foreign language. New York: Routledge.Google Scholar
Chapelle, C. A., Schmidgall, J., Lopez, A., Blood, I., Wain, J., Cho, Y., Hutchison, A., Lee, H.-W., & Dursun, A. (2018). Designing a prototype tablet-based learning-oriented assessment for middle school English learners: An evidence-centered design approach. ETS Research Report Series, 2018(1), 155.CrossRefGoogle Scholar
Corbin, J., & Strauss, A. (2008). Basics of qualitative research (3rd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar
Cushing, S. (1994). Fatal words: Communication clashes and aircraft crashes. Chicago: University of Chicago Press.Google Scholar
Douglas, D. (2000). Assessing languages for specific purposes. Cambridge: Cambridge University Press.Google Scholar
Emery, H. J. (2014). Developments in LSP testing 30 years on? The case of aviation English. Language Assessment Quarterly, 11(2), 198215.CrossRefGoogle Scholar
Hamp-Lyons, L., & Lumley, T. (2001). Assessing language for specific purposes. Thousand Oaks, CA: Sage Publications.CrossRefGoogle Scholar
Hines, S. (2010). Evidence-centered design: The TOEIC speaking and writing tests. The Research Foundation for TOEIC: A Compendium of Studies, 7.1.Google Scholar
Kane, M. (2013). Articulating a validity argument. In The Routledge handbook of language testing (pp. 4861). London: Routledge.Google Scholar
Krosnick, J. A., Narayan, S. S., & Smith, W. R. (1996). Satisficing in surveys: Initial evidence. New Directions for Evaluation, 70, 2944.CrossRefGoogle Scholar
Long, M. H., & Norris, J. M. (2000). Task-based teaching and assessment. Encyclopedia of Language Teaching, 597–603.Google Scholar
McNamara, T. F. (1996). Measuring second language performance. Harlow: Longman.Google Scholar
Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 6068). New York: Macmillan.Google Scholar
Messick, S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher, 23(2), 1323.CrossRefGoogle Scholar
Mislevy, R. J. (2006). Cognitive psychology and educational assessment. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 257305). Westport, CT: American Council on Education/Praeger Publishers.Google Scholar
Mislevy, R. J. (2013). Evidence-centered design for simulation-based assessment. Military Medicine, 178(Suppl_10), 107114.CrossRefGoogle ScholarPubMed
Mislevy, R. J., Almond, R. G., & Lukas, J. F. (2003). A brief introduction to evidence-centered design. ETS Research Report Series, 2003(1), i29.CrossRefGoogle Scholar
Mislevy, R. J., & Haertel, G. D. (2006). Implications of evidence‐centered design for educational testing. Educational Measurement: Issues and Practice, 25(4), 620.CrossRefGoogle Scholar
Mislevy, R. J., & Steinberg, L. S. (2003). Focus article: On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1(1), 362.Google Scholar
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2002). Design and analysis in task-based language assessment. Language Testing, 19(4), 477496.CrossRefGoogle Scholar
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 367.Google Scholar
Mislevy, R. J., Steinberg, L., Almond, R. G., & Lucas, J. F. (2006). Concepts, terminology, and basic models of evidence-centered design. In Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 4982). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
Norris, J. M. (2001). Identifying rating criteria for task-based EAP assessment. In Hudson, T. D. & Brown, J. D. (Eds.), A focus on language test development: Expanding the language proficiency construct across a variety of tests (pp. 163204). Honolulu: University of Hawai’i Press.Google Scholar
Norris, J. M. (2014). How do we assess task-based performance? Invited LARC/CALPER testing and assessment webinar.Google Scholar
Norris, J., Brown, J. D., Hudson, T., & Yoshioka, J. (1998). Designing second language performance assessments. Technical report 18, University of Hawaii, Honolulu.Google Scholar
Park, M. (2015). Development and validation of virtual interactive tasks for an aviation English assessment. Doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Pearlman, M. (2008). Finalizing the test blueprint. In Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.), Building a validity argument for the Test of English as a Foreign Language™ (pp. 227258). Philadelphia, PA: Routledge.Google Scholar
Zokić, M., Boras, D., & Lazić, N. (2012). Computer-aided Aviation English testing on example of RELTA test. Paper presented at the 2012 Proceedings of the 35th International Convention MIPRO.Google Scholar
Bachman, L. F. (2005). Building and supporting a case for test use. Language Assessment Quarterly, 19(4), 453476.Google Scholar
Bachman, L. F., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar
Canale, M. (1986). The promise and threat of computerized adaptive assessment of reading comprehension. In Stansfield, C. (Ed.), Technology and language testing (pp. 3045). Washington, DC: TESOL.Google Scholar
Chapelle, C. A. (2012). Conceptions of validity. In Fulcher, G. & Davidson, F. (Eds.), The Routledge handbook of language testing (pp. 2133). New York: Routledge.Google Scholar
Chapelle, C. A., Cotos, E., & Lee, J. (2015). Validity arguments for diagnostic assessment using automated writing evaluation. Language Testing, Language Testing, 33(2), 385405.CrossRefGoogle Scholar
Chapelle, C. A., & Douglas, D. (2006). Assessing language through computer technology. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign LanguageTM. New York: Routledge.Google Scholar
Chung, Y. (2014). A test of productive English grammatical ability in academic writing: Development and validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Cotos, E., & Chung, Y.-R. (2018). Domain description: Validating the interpretation of the TOEFL iBT® speaking scores for international teaching assistant screening and certification purposes. TOEFL Research Report No. RR-85. Princeton, NJ: Educational Testing Service. https://doi.org/10.1002/ets2.12233Google Scholar
Creswell, J. W., & Plano Clark, V. L. (2011). Designing and conducting mixed methods research (2nd ed.). Thousand Oaks, CA: Sage Publications.Google Scholar
Elder, C., Barkhuizen, G., Knoch, U., & Randow, J. V. (2007). Evaluating rater responses to an online training program for L2 writing assessment. Language Testing, 24(1), 3764.CrossRefGoogle Scholar
Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory. New York: Aldine.Google Scholar
Jun, H. (2014). A validity argument for the use of scores from a web-search-permitted and web-source-based integrated writing test. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527535.CrossRefGoogle Scholar
Kane, M. T. (2001). Current concerns in validity theory. Journal of Educational Measurement, 38(4), 319342.CrossRefGoogle Scholar
Kane, M. T. (2004). Certification testing as an illustration of argument-based validation. Measurement: Interdisciplinary Research and Perspectives, 2(3), 135170.Google Scholar
Kane, M. T. (2006). Validation. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: American Council on Education.Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173.CrossRefGoogle Scholar
Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477499.CrossRefGoogle Scholar
McNamara, T. F. (1996). Measuring second language performance. London: Longman.Google Scholar
Ockey, G. J. (2009). The effects of a test taker’s group members’ personalities on the test taker’s second language group oral discussion test scores. Language Testing, 26(2), 161186.CrossRefGoogle Scholar
Yang, H. (2016). Integration of a web-based rating system with an oral proficiency interview test: Argument-based approach to validation. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Bachman, L. F., Lynch, B. K., & Mason, M. (1995). Investigating variability in tasks and rater judgments in a performance test of foreign language speaking. Language Testing, 12(2), 238257.CrossRefGoogle Scholar
Barkaoui, K. (2007). Rating scale impact on EFL essay marking: A mixed-method study. Assessing Writing, 12(2), 86107.CrossRefGoogle Scholar
Bouwer, R., Béguin, A., Sanders, T., & van den Bergh, H. (2015). Effect of genre on the generalizability of writing scores. Language Testing, 32(1), 83100.CrossRefGoogle Scholar
Bridges, G. (2010). Demonstrating cognitive validity of IELTS academic writing task 1. Cambridge ESOL Research Notes, 42, 2433. Retrieved from www.cambridgeenglish.org/images/23160-research-notes-42.pdfGoogle Scholar
Briesch, A. M., Swaminathan, H., Welsh, M., & Chafouleas, S. M. (2014). Generalizability theory: A practical guide to study design, implementation, and interpretation. Journal of School Psychology, 52(1), 1335.CrossRefGoogle ScholarPubMed
Briggs, D. C. (2004). Comment: Making an argument for design validity before interpretive validity. Measurement, 2(3), 171174.Google Scholar
Carr, N. T. (2011). Designing and analyzing language tests. Oxford: Oxford University Press.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign LanguageTM. New York: Routledge.Google Scholar
Choi, Y. D. (2018). Graphic-prompt tasks for assessment of academic English writing ability: An argument-based approach to investigating validity. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Conklin, K., Pellicer-Sánchez, A., & Carrol, G. (2018). Eye-tracking: A guide for applied linguistics research. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Creswell, J. W., & Plano Clark, V. L. (2007). Designing and conducting mixed methods research. London: Sage Publications.Google Scholar
Cumming, A. (2013). Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10(1), 18.CrossRefGoogle Scholar
Cumming, A., Grant, L., Mulcahy-Ernt, P., & Powers, D. E. (2004). A teacher-verification study of speaking and writing prototype tasks for a new TOEFL. Language Testing, 21(2), 107145.CrossRefGoogle Scholar
Cumming, A., Kantor, R., Baba, K., Erdosy, U., Eouanzoui, K., & James, M. (2005). Differences in written discourse in independent and integrated prototype tasks for next generation TOEFL. Assessing Writing, 10(1), 543.CrossRefGoogle Scholar
Eckes, T. (2015). Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessment (2nd ed.). New York: Peter Lang.Google Scholar
Farahani, D. B., & Kashanifar, F. S. (2016). Graph writing test taking strategies and performance on the task: The role of academic background. Journal of Applied Linguistics and Language Research, 3(2), 5169.Google Scholar
Gebril, A. (2009). Score generalizability of academic writing tasks: Does one test method fit it all? Language Testing, 26(4), 507531.CrossRefGoogle Scholar
Gebril, A. (2010). Bringing reading-to-write and writing-only assessment tasks together: A generalizability analysis. Assessing Writing, 15(2), 100117.CrossRefGoogle Scholar
Hyland, K. (2006). English for academic purposes: An advanced resource book. New York: Routledge.CrossRefGoogle Scholar
IBM Corp. (2015). IBM SPSS statistics for Macintosh (version 23.0) [Computer software]. Armonk, NY: IBM Corp.Google Scholar
In’nami, Y., & Koizumi, R. (2016). Task and rater effects in L2 speaking and writing: A synthesis of generalizability studies. Language Testing, 33(3), 341366.CrossRefGoogle Scholar
Iowa State University. (2019). English placement test. Retrieved from https://apling.engl.iastate.edu/english-placement-test/Google Scholar
Jewitt, C. (2005). Multimodality, “reading”, and “writing” for the 21st century. Discourse: Studies in the Cultural Politics of Education, 26(3), 315331.Google Scholar
Jewitt, C. (2008). Multimodality and literacy in school classrooms. Review of Research in Education, 32(1), 241267.CrossRefGoogle Scholar
Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: Greenwood Publishing.Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173.CrossRefGoogle Scholar
Knoch, U. (2009). Diagnostic assessment of writing: A comparison of two rating scales. Language Testing, 26(2), 275304.CrossRefGoogle Scholar
Knoch, U., & Chapelle, C. A. (2017). Validation of rating processes within an argument-based framework. Language Testing, 35(4), 477499https://doi.org/10.1177/0265532217710049CrossRefGoogle Scholar
Knoch, U., & Sitajalabhorn, W. (2013). A closer look at integrated writing tasks: Towards a more focused definition for assessment purposes. Assessing Writing, 18(4), 300308.CrossRefGoogle Scholar
Lee, Y. W., & Kantor, R. (2007). Evaluating prototype tasks and alternative rating schemes for a new ESL writing test through G-theory. International Journal of Testing, 7(4), 353385.CrossRefGoogle Scholar
Lim, G. S. (2009). Prompt and rater effects in second language writing performance assessment. Unpublished doctoral dissertation, University of Michigan.Google Scholar
Linacre, J. M. (2014). Facets Rasch measurement computer program (version 3.71.4) [Computer software]. Chicago: Winsteps.com.Google Scholar
Mackey, A., & Gass, S. M. (2005). Second language research: Methodology and design. New York: Routledge.Google Scholar
Mickan, P., Slater, S., & Gibson, C. (2000). Study of response validity of the IELTS writing subtest. International English Language Testing System, 3, 2948.Google Scholar
Mushquash, C., & O’Connor, B. P. (2006). SPSS and SAS programs for generalizability theory analyses. Behavior Research Methods, 38(3), 542547.CrossRefGoogle ScholarPubMed
Ockey, G. J. (2012). Item response theory. In Fulcher, G. & Davidson, F. (Eds.), Routledge handbook of language testing (pp. 316328). London: Routledge.Google Scholar
O’Loughlin, K., & Wigglesworth, G. (2003). Task design in IELTS academic writing task 1: The effect of quantity and manner of presentation of information on candidate writing. IELTS research report #4. Retrieved from http://search.informit.com.au/documentSummary;dn=908957733867582;res=IELHSSGoogle Scholar
Plakans, L. (2008). Comparing composing processes in writing-only and reading-to-write test tasks. Assessing Writing, 13(2), 111129.CrossRefGoogle Scholar
Schoonen, R. (2005). Generalizability of writing scores: An application of structural equation modeling. Language Testing, 22(1), 130.CrossRefGoogle Scholar
Shavelson, R. J., & Webb, N. M. (1991). Generalizability theory: A primer. London: Sage Publications.Google Scholar
Shin, S. Y., & Ewert, D. (2015). What accounts for integrated reading-to-write task scores? Language Testing, 32(2), 259281.CrossRefGoogle Scholar
Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263287.CrossRefGoogle Scholar
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6(2), 145178.CrossRefGoogle Scholar
Weigle, S. C. (2002). Assessing writing. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Yang, H. C. (2012a). A comparative study of composing processes in reading-and graph-based writing tasks. Language Testing in Asia, 2(3), 33.CrossRefGoogle Scholar
Yang, H. C. (2012b). Modeling the relationships between test-taking strategies and test performance on a graph-writing task: Implications for EAP. English for Specific Purposes, 31(3), 174187.CrossRefGoogle Scholar
Yang, H. C. (2016). Describing and interpreting graphs: The relationships between undergraduate writer characteristics and academic graph writing performance. Assessing Writing, 28, 2842.CrossRefGoogle Scholar
Yu, G., Rea-Dickens, P., & Kiely, P. (2012). The cognitive processes of taking IELTS Academic Writing Task 1. IELTS research report #11. Retrieved from www.ielts.org/PDF/vol11_report_6_the_cognitive_processes.pdfGoogle Scholar
Abe, M., Kondo, Y., Kobayashi, Y., Murakami, A., & Fujiwara, Y. (July 2018). Initial findings from a longitudinal learner corpus: A year-long development of L2 speaking performance. Paper presented at the 13th Teaching and Language Corpora Conference 2018, University of Cambridge, UK.Google Scholar
ALC. (2014). ALC eigo kyoiku jittai repoto 2014: Supikingu tesuto to gakushu adobaisu gyomu wo chushinni [ALC English Education Field Survey Report 2014: Focus on a speaking test and advice-giving business operations]. Tokyo: ALC. Retrieved from www.alc.co.jp/company/report/Google Scholar
ALC Educational Research Institute. (2016). Nihonjin no eigo supikingu noryoku: Risuningu, ridingu ryoku tono kankeisei ni miru eigo unyo noryoku no jittai [English speaking proficiency of Japanese learners: From a viewpoint of relationships between listening and reading abilities]. ALC English Education Field Survey Report, Vol. 7. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20160627.pdfGoogle Scholar
ALC Educational Research Institute. (2018). Nihon no kokosei no eigo supikingu noryoku jittai chosa III: Koko ichinenji kara sannenji de kokosei no eigoryoku wa donoyoni henkashitaka [How senior high school students’ English speaking proficiency changed from the first year to the third year]. ALC English Education Field Survey Report, Vol. 11. Tokyo: Author. Retrieved from www.alc.co.jp/company/report/pdf/alc_report_20180731.pdfGoogle Scholar
Aryadoust, V. (2013). Building a validity argument for a listening test of academic proficiency. Newcastle: Cambridge Scholars Publishing.Google Scholar
Barkaoui, K. (2017). Examining repeaters’ performance on second language proficiency tests: A review and a call for research [Commentary]. Language Assessment Quarterly, 14, 420431. doi:10.1080/15434303.2017.1347790CrossRefGoogle Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. M. (Eds.). (2008). Building a validity argument for the Test of English as a Foreign Language™. New York: Routledge.Google Scholar
Creswell, J. W., & Plano Clark, V. L. (2018). Designing and conducting mixed methods research (3rd ed.). Thousand Oaks, CA: Sage.Google Scholar
Glass, G. V., & Hopkins, K. D. (1996). Statistical methods in education and psychology (3rd ed.). Boston, MA: Allyn & Bacon.Google Scholar
Gliner, J. A., Morgan, G. A., & Leech, N. L. (2017). Research methods in applied settings: An integrated approach to design and analysis (3rd ed.). New York: Routledge.Google Scholar
Harvill, L. M. (1991). An NCME instructional module on standard error of measurement [Instructional topics in educational measurement]. Educational Measurement: Issues and Practice, 10(2), 181189. doi:10.1111/j.1745-3992.1991.tb00195.xCrossRefGoogle Scholar
Henning, G. (1987). A guide to language testing: Development, evaluation, research. Boston, MA: Heinle & Heinle.Google Scholar
Johnson, R. C. (2012). Assessing the assessments: Using an argument-based validity framework to assess the validity and use of an English placement system in a foreign language context. Doctoral dissertation, Macquarie University, Sydney, Australia.Google Scholar
Kane, M. T. (2006). Validation. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: American Council on Education and Praeger.Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 173. doi:10.1111/jedm.12000CrossRefGoogle Scholar
Knoch, U., & Chapelle, C. A. (2018). Validation of rating processes within an argument-based framework. Language Testing, 35, 477499. doi:10.1177/0265532217710049CrossRefGoogle Scholar
Koizumi, R. (2018). Eigo yongino tesuto no erabikata to tsukaikata: Datosei no kantenkara [How we can select and use English four-skill tests: From the viewpoint of validity]. Tokyo: ALC.Google Scholar
Koizumi, R., In’nami, Y., Azuma, J. Asano, K., Agawa, T., & Eberl, D. (2015). Assessing L2 proficiency growth: Considering regression to the mean and the standard error of difference. Shiken, 19(1), 315. Retrieved from http://teval.jalt.org/node/16Google Scholar
Kunnan, A. J. (2018). Evaluating language assessments. New York: Routledge.Google Scholar
Llosa, L. (2008). Building and supporting a validity argument for a standards-based classroom assessment of English proficiency based on teacher judgments. Educational Measurement: Issues and Practice, 27(3), 3242. doi:10.1111/j.1745-3992.2008.00126.xCrossRefGoogle Scholar
Marsden, E., & Torgerson, C. J. (2012). Single group, pre- and post-test research designs: Some methodological concerns. Oxford Review of Education, 38, 583616. doi:10.1080/03054985.2012.731208CrossRefGoogle Scholar
McManus, I. C. (2012). The misinterpretation of the standard error of measurement in medical education: A primer on the problems, pitfalls and peculiarities of the three different standard errors of measurement. Medical Teacher, 34, 569576. doi:10.3109/0142159X.2012.670318CrossRefGoogle ScholarPubMed
Mizumoto, A., & Plonsky, L. (2016). R as a lingua franca: Advantages of using R for quantitative research in applied linguistics. Applied Linguistics, 37, 284291. doi:10.1093/applin/amv025CrossRefGoogle Scholar
Ogino, K. (2002). Eigo supikingu noryoku tesuto SST towa nanika [What is the Standard Speaking Test?]. In Waseda Oral Communication Research Institute Research Report (pp. 29). Tokyo: Waseda Oral Communication Research Institute.Google Scholar
Pardo-Ballester, C. (2010). The validity argument of a web-based Spanish Listening Exam: Test usefulness evaluation. Language Assessment Quarterly, 7, 137159. doi:10.1080/15434301003664188CrossRefGoogle Scholar
Riazi, A. M. (2016). The Routledge encyclopedia of research methods in applied linguistics: Quantitative qualitative, and mixed-methods research. Oxon, Oxford: Routledge.CrossRefGoogle Scholar
Schwarz, W., & Reike, D. (2018). Regression away from the mean: Theory and examples. British Journal of Mathematical and Statistical Psychology, 71, 186203. doi:10.1111/bmsp.12106CrossRefGoogle ScholarPubMed
Suzuki, Y., & Koizumi, R. (in press). Using equivalent test forms in SLA pretest-posttest design research. In Winke, P. & Brunfaut, T. (Eds.), The Routledge handbook of second language acquisition and language testing. New York: Routledge.Google Scholar
Xi, X. (2010). How do we go about investigating test fairness? Language Testing, 27, 147170. doi:10.1177/0265532209349465Google Scholar
Zhang, Y. (2008). Repeater analyses for TOEFL iBT. Research Memorandum 08-05. Princeton, NJ: Educational Testing Service. Retrieved from www.ets.org/research/policy_research_reports/publications/report/2008/ibyaGoogle Scholar
Zhou, Y. (2015). Comparing ratings of a face-to-face and telephone-mediated speaking test. JACET Journal, 59, 3352. Retrieved from http://dl.ndl.go.jp/info:ndljp/pid/10501826?tocOpened=1Google Scholar
Alderson, J. C. (2000). Assessing reading. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Alderson, C., & Kremmel, B. (2013). Re-examining the content validation of a grammar test: The (im)possibility of distinguishing vocabulary and structural knowledge. Language Testing, 30(4), 535556.CrossRefGoogle Scholar
Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford: Oxford University Press.Google Scholar
Bahns, J., & Eldaw, M. (1993). Should we teach EFL students collocations? System, 21(1), 101114.CrossRefGoogle Scholar
Biskup, D. (1992). L1 influence on learners’ renderings of English collocations: A Polish/German empirical study. In Arnaud, P. J. L. & Bejoint, H. (Eds.), Vocabulary and applied linguistics (pp. 8593). London: Macmillan.CrossRefGoogle Scholar
Chapelle, C. A. (1994). Are C-tests valid measures for L2 vocabulary research? Second Language Research, 10(2), 157187.CrossRefGoogle Scholar
Chapelle, C. A. (1998). Construct definition and validity inquiry in SLA research. In Bachman, L. F. & Cohen, A. D. (Eds.), Second language acquisition and language testing interfaces. Cambridge: Cambridge University Press.Google Scholar
Chapelle, C. A. (2020). Argument-based validation in testing and assessment. Sage.Google Scholar
Chapelle, C. A., Enright, M. K., & Jamieson, J. (2008). Building a validity argument for the Test of English as a Foreign Language. New York and London: Routledge.Google Scholar
Cheng, W., Greaves, C., Sinclair, J. McH., & Warren, M. (2009). Uncovering the extent of the phraseological tendency: Towards a systematic analysis of concgrams. Applied Linguistics, 30(2), 236252.CrossRefGoogle Scholar
Conrad, S. M., & Biber, D. (2005). The frequency and use of lexical bundles in conversation and academic prose. Lexicographica, 20, 5671.Google Scholar
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34, 213238.CrossRefGoogle Scholar
Creswell, J. W., & Plano Clark, V. L., (2007). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage.Google Scholar
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281302.CrossRefGoogle ScholarPubMed
Davies, M. (2008). The corpus of contemporary American English: 425 million words, 1990–present. Retrieved from http://corpus.byu.edu/coca/Google Scholar
Durrant, P. (2009). Investigating the viability of a collocation list for students of English for academic purposes. English for Specific Purposes, 28(3), 157169.CrossRefGoogle Scholar
Ellis, N. C. (2006). SLA: The associative cognitive CREED. In VanPatten, B., Williams, J., & Williams, A. F. (Eds.), Theories in second language acquisition: An introduction. Mahwah, NJ: Erlbaum.Google Scholar
Howarth, P. (1996). Phraseology in English academic writing: Some implications for language learning and dictionary making. Tübingen, Germany: Niemeyer.CrossRefGoogle Scholar
Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1), 2444.CrossRefGoogle Scholar
Hyland, K. (2012). Bundles in academic discourse. Annual Review of Applied Linguistics, 32, 150169.CrossRefGoogle Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527535.CrossRefGoogle Scholar
Kane, M. T. (2006). Validation. In Brennen, R. (Ed.), Educational measurement (4th ed., pp. 1764). Westport, CT: Greenwood.Google Scholar
Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 173.CrossRefGoogle Scholar
Karimi, N. (2011). C-test and vocabulary knowledge. Language Testing in Asia, 10(4), 738.CrossRefGoogle Scholar
Laufer, B., & Nation, P. (1999). A vocabulary size test of controlled productive ability. Language Testing, 16(1), 3351.CrossRefGoogle Scholar
Lee, D. Y. W. (2001). Genres, registers, text types, domains and styles: Clarifying the concepts and navigating a path through the BNC jungle. Language Learning and Technology, 5(3), 3772.Google Scholar
Leong, C. K., Ho, M. K., Chang, J., & Hau, K. T. (2013). Differential importance of language components in determining secondary school students’ Chinese reading literacy performance. Language Testing, 30(4), 419439.CrossRefGoogle Scholar
Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (pp. 13104). New York: Macmillan.Google Scholar
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24, 223242.CrossRefGoogle Scholar
Qian, D. (1999). Assessing the roles of depth and breadth of vocabulary knowledge in reading comprehension. Canadian Modern Language Review, 56(2), 282307.CrossRefGoogle Scholar
Read, J., (2016). Some key issues in post-admission language assessment. In Read, J. (Ed.), Post-admission language assessment of university students. Switzerland: Springer.CrossRefGoogle Scholar
Revier, R. L. (2009). Evaluating a new test of whole English collocations. In Gyllstad, J. & Barfield, A. (Eds.) Researching collocations in another language: Multiple interpretations (pp. 125138). New York: Palgrave Macmillan.CrossRefGoogle Scholar
Roche, R., Harrington, M., Sinha, Y., & Denman, C. (2016). Vocabulary recognition skill as a screening tool in English-as-a-Lingua-France university settings. In Read, J. (Ed.), Post-admission language assessment of university students. Switzerland: Springer.Google Scholar
Römer, U. (2017). Language assessment and the inseparability of lexis and grammar: Focus on the construct of speaking. Language Testing, 34(4), 477492.CrossRefGoogle Scholar
Shiotsu, T., & Weir, C. J. (2007). The relative significance of syntactic knowledge and vocabulary breadth in the prediction of reading comprehension test performance. Language Testing, 24(1), 99128.CrossRefGoogle Scholar
Voss, E. (2012). A validity argument for score meaning of a computer-based ESL academic collocational ability test based on a corpus-driven approach to test design. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
West, M. (1953). A general service list of English words. London: Longman, Green.Google Scholar
Atkinson, D., & Ramanathan, V. (1995). Cultures of writing: An ethnographic comparison of L1 and L2 university writing/language programs. TESOL Quarterly, 29(3), 539568.CrossRefGoogle Scholar
Costino, K. A., & Hyon, S. (2011). Sidestepping our “scare words”: Genre as a possible bridge between L1 and L2 compositionists. Journal of Second Language Writing, 20(1), 2444.CrossRefGoogle Scholar
Creswell, J. W., & Plano Clark, V. L. (2017). Designing and conducting mixed methods research. Thousand Oaks, CA: Sage.Google Scholar
Gardner, S., & Nesi, H. (2013). A classification of genre families in university student writing. Applied Linguistics, 34(1), 2552.CrossRefGoogle Scholar
Hale, G., Taylor, C., Bridgeman, B., Carson, J., Kroll, B., & Kantor, R. (1996). A study of writing tasks assigned in academic degree programs. TOEFL Report No. 54. Princeton, NJ: Educational Testing Service.Google Scholar
James, M. A. (2008). The influence of perceptions of task similarity/difference on learning transfer in second language writing. Written Communication, 25(1), 76103.CrossRefGoogle Scholar
James, M. A. (2014). Learning transfer in English-for-academic-purposes contexts: A systematic review of research. Journal of English for Academic Purposes, 14, 113.CrossRefGoogle Scholar
Lee, J. (2016). Transfer from ESL academic writing to first year composition and other disciplinary courses: An assessment perspective. Unpublished doctoral dissertation, Iowa State University, Ames, IA.Google Scholar
Lee, J. (2019). A comparison of writing tasks in ESL writing and first-year composition courses: A case study of one US university. Language Teaching Research. https://doi.org/10.1016/S1060-3743(96)90020-XCrossRefGoogle Scholar
Leki, I., & Carson, J. (1997). “Completely different worlds”: EAP and the writing experiences of ESL students in university courses. TESOL Quarterly, 31(1), 3969.CrossRefGoogle Scholar
Moore, T., & Morton, J. (2005). Dimensions of difference: A comparison of university writing and IELTS writing. Journal of English for Academic Purposes, 4(1), 4366.CrossRefGoogle Scholar
Nesi, H., & Gardner, S. (2012). Genres across the disciplines: Student writing in higher education. Cambridge: Cambridge University Press.Google Scholar
Redden, E. (2014, December 1). Teaching International Students. Insider Higher Ed. Retrieved from www.insidehighered.com/news/2014/12/01/increasing-international-enrollments-faculty-grapple-implications-classroomsGoogle Scholar
Zong, J., & Batalova, J. (2018, May 9). International Students in the United States. Migration Policy Institute. Retrieved from www.migrationpolicy.org/article/international-students-united-statesGoogle Scholar

Send book to Kindle

To send this book to your Kindle, first ensure no-reply@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about sending to your Kindle.

Note you can select to send to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Send book to Dropbox

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Dropbox.

Available formats
×

Send book to Google Drive

To send content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about sending content to Google Drive.

Available formats
×