Using Information from Multiple-Choice Distractors to Enhance Cognitive-Diagnostic Score Reporting

doi:10.1017/CBO9780511611186.011

11 - Using Information from Multiple-Choice Distractors to Enhance Cognitive-Diagnostic Score Reporting

Published online by Cambridge University Press: 23 November 2009

Richard M. Luecht

Edited by

Jacqueline Leighton and

Mark Gierl

Show author details

Richard M. Luecht: Affiliation:
Professor of Education Research and Methodology, University of North Carolina at Greensboro
Jacqueline Leighton: Affiliation:
University of Alberta
Mark Gierl: Affiliation:
University of Alberta

Book contents

Get access

Summary

Unidimensional tests primarily measure only one proficiency trait or ability (Hambleton & Swaminathan, 1985). That is, we assume that a single proficiency trait can completely explain the response patterns observed for a population of test takers. However, most tests exhibit some multidimensionality (i.e., responses that depend on more than one proficiency trait or ability). Multidimensionality may be due to the cognitive complexity of the test items, motivational propensities of the test takers, or other more extraneous factors (Ackerman, 2005; Ackerman, Gierl, & Walker, 2003).

Diagnostically useful scores that profile examinees' strengths and weaknesses require well-behaved or principled multidimensional measurement information. This presents a challenge for established test development and psychometric scaling practices that are aimed at producing unidimensional tests and maintaining unidimensional score scales so accurate summative decisions can be made over time (e.g., college admissions, placement, or granting of a professional certificate or licensure). Any multidimensionality detected during the scaling process is treated as “nuisance factors” not accounted for when designing the test items and building test forms (e.g., passage effects due to choices of topics or method variance due to item types). In fact, most item response theory (IRT) scaling procedures regard multidimensionality and related forms of residual covariation in the response data as statistical misfit or aberrance (Hambleton & Swaminathan, 1985). Can this largely uncontrolled “misfit” or residual covariance be exploited for legitimate diagnostic purposes? Probably not.

Type: Chapter
Information: Cognitive Diagnostic Assessment for Education
Theory and Applications
, pp. 319 - 340

DOI: https://doi.org/10.1017/CBO9780511611186.011 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T.A. (2005). Multidimensional item response theory modeling. In Maydue-Olivares, A. & McArdle, J.J. (Eds.), Contemporary psychometrics (pp. 3–25). Mahwah, NJ: Erlbaum.Google Scholar

Ackerman, T.A., Gierl, M.J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37–53.CrossRef Google Scholar

Cliff, N. (1987). Analyzing multivariate data. San Diego: Harcourt Brace Jovanovich.Google Scholar

Drasgow, F., Luecht, R.M., & Bennett, R.E. (in press). Technology in testing. In Brennan, R.L. (Ed.), Educational measurement (4th ed., pp. 471-515). Washington, DC: American Council on Education and Praeger Publishers.Google Scholar

Ferrera, S., Duncan, T., Perie, M., Freed, R., Mcgivern, J, & Chilukuri, R. (2003, April). Item construct validity: early results from a study of the relationship between intended and actual cognitive demands in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego.Google Scholar

Folske, J.C., Gessaroli, M.E., & Swanson, D.B. (1999, April). Assessing the utility of an IRT-based method for using collateral information to estimate subscores. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.Google Scholar

Gorsuch, R.L. (1983). Factor analysis (2nd ed.). Mahwah, NJ: Erlbaum.Google Scholar

Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.CrossRef Google Scholar

Huff, K. (2004, April). A practical application of evidence-centered design principles: coding items for skills. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar

Irvine, S.H., & Kyllonen, P.C. (2002). Item generation for test development. Mahwah, NJ: Erlbaum.Google Scholar

Jöreskog, K.G. (1969). Efficient estimation in image factor analysis. Psychometrika, 34, 183–202.CrossRef Google Scholar

Leighton, J.P. (2004). Avoiding misconceptions, misuse, and missed opportunities: The collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23, 1–10.Google Scholar

Leighton, J.P., Gierl, M.J., & Hunka, S. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205–237.CrossRef Google Scholar

Liu, J., Feigenbaum, M., & Walker, M.E. (2004, April). New SAT and new PSAT/NMSQT Spring 2003 field trial design. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar

Longford, N.T. (1997). Shrinkage estimation of linear combinations of true scores. Psychometrika, 62, 237–244.CrossRef Google Scholar

Luecht, R.M. (1993, April). A marginal maximum likelihood approach to deriving multidimensional composite abilities under the generalized partial credit model. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.Google Scholar

Luecht, R.M. (2001, April). Capturing, codifying and scoring complex data for innovative, computer-based items. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle.Google Scholar

Luecht, R.M. (2002, April). From design to delivery: Engineering the mass production of complex performance assessments. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.Google Scholar

Luecht, R.M. (2005a, April). Extracting multidimensional information from multiple-choice question distractors for diagnostic scoring. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.Google Scholar

Luecht, R.M. (2005b). Item analysis. In Everitt, B.S. & Howell, D.C. (Eds.), Encyclopedia of Statistics in Behavioral Science, London: Wiley.CrossRef Google Scholar

Luecht, R.M. (2006, February). Computer-based approaches to diagnostic assessment. Invited presentation at the annual meeting of the Association of Test Publishers, Orlando, FL.Google Scholar

Luecht, R.M., Gierl, M.J., Tan, X., & Huff, K. (2006, April). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.Google Scholar

Mislevy, R.J., & Riconscente, M.M. (2005, July). Evidence-centered assessment design: layers, structures, and terminology (PADI Technical Report 9). Menlo Park, CA: SRI International.Google Scholar

Nishisato, S. (1984). A simple application of a quantification method. Psychometrika, 49, 25–36.CrossRef Google Scholar

O'Callaghan, R.K., Morley, M.E., & Schwartz, A. (2004, April). Developing skill categories for the SAT math section. Paper presented at the annual meeting at the National Council on Measurement in Education, San Diego.Google Scholar

Wainer. H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., Nelson, L., Swygert, K.A., & Thissen, D. (2001). Augmented scores – “Borrowing strength” to compute scores based upon small numbers of items. In Wainer, H. & Thissen, D. (Eds.), Test scoring (pp. 343–387). Mahwah, NJ: Erlbaum.Google Scholar

VanderVeen, A. (2004, April). Toward a construct of critical reading for the new SAT. Paper presented at the annual meeting at the National Council on Measurement in Education, San Diego.Google Scholar

Vevea, J.L., Edwards, M.C., & Thissen, D. (2002). User's guide for AUGMENT v.2: Empirical Bayes subscore augmentation software. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.Google Scholar

Book contents

11 - Using Information from Multiple-Choice Distractors to Enhance Cognitive-Diagnostic Score Reporting

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive