Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-swr86 Total loading time: 0 Render date: 2024-07-17T10:30:09.138Z Has data issue: false hasContentIssue false

11 - Using Information from Multiple-Choice Distractors to Enhance Cognitive-Diagnostic Score Reporting

Published online by Cambridge University Press:  23 November 2009

Richard M. Luecht
Affiliation:
Professor of Education Research and Methodology, University of North Carolina at Greensboro
Jacqueline Leighton
Affiliation:
University of Alberta
Mark Gierl
Affiliation:
University of Alberta
Get access

Summary

Unidimensional tests primarily measure only one proficiency trait or ability (Hambleton & Swaminathan, 1985). That is, we assume that a single proficiency trait can completely explain the response patterns observed for a population of test takers. However, most tests exhibit some multidimensionality (i.e., responses that depend on more than one proficiency trait or ability). Multidimensionality may be due to the cognitive complexity of the test items, motivational propensities of the test takers, or other more extraneous factors (Ackerman, 2005; Ackerman, Gierl, & Walker, 2003).

Diagnostically useful scores that profile examinees' strengths and weaknesses require well-behaved or principled multidimensional measurement information. This presents a challenge for established test development and psychometric scaling practices that are aimed at producing unidimensional tests and maintaining unidimensional score scales so accurate summative decisions can be made over time (e.g., college admissions, placement, or granting of a professional certificate or licensure). Any multidimensionality detected during the scaling process is treated as “nuisance factors” not accounted for when designing the test items and building test forms (e.g., passage effects due to choices of topics or method variance due to item types). In fact, most item response theory (IRT) scaling procedures regard multidimensionality and related forms of residual covariation in the response data as statistical misfit or aberrance (Hambleton & Swaminathan, 1985). Can this largely uncontrolled “misfit” or residual covariance be exploited for legitimate diagnostic purposes? Probably not.

Type
Chapter
Information
Cognitive Diagnostic Assessment for Education
Theory and Applications
, pp. 319 - 340
Publisher: Cambridge University Press
Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ackerman, T.A. (2005). Multidimensional item response theory modeling. In Maydue-Olivares, A. & McArdle, J.J. (Eds.), Contemporary psychometrics (pp. 3–25). Mahwah, NJ: Erlbaum.Google Scholar
Ackerman, T.A., Gierl, M.J., & Walker, C. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22, 37–53.CrossRefGoogle Scholar
Cliff, N. (1987). Analyzing multivariate data. San Diego: Harcourt Brace Jovanovich.Google Scholar
Drasgow, F., Luecht, R.M., & Bennett, R.E. (in press). Technology in testing. In Brennan, R.L. (Ed.), Educational measurement (4th ed., pp. 471-515). Washington, DC: American Council on Education and Praeger Publishers.Google Scholar
Ferrera, S., Duncan, T., Perie, M., Freed, R., Mcgivern, J, & Chilukuri, R. (2003, April). Item construct validity: early results from a study of the relationship between intended and actual cognitive demands in a middle school science assessment. Paper presented at the annual meeting of the American Educational Research Association, San Diego.Google Scholar
Folske, J.C., Gessaroli, M.E., & Swanson, D.B. (1999, April). Assessing the utility of an IRT-based method for using collateral information to estimate subscores. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.Google Scholar
Gorsuch, R.L. (1983). Factor analysis (2nd ed.). Mahwah, NJ: Erlbaum.Google Scholar
Hambleton, R.K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.CrossRefGoogle Scholar
Huff, K. (2004, April). A practical application of evidence-centered design principles: coding items for skills. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar
Irvine, S.H., & Kyllonen, P.C. (2002). Item generation for test development. Mahwah, NJ: Erlbaum.Google Scholar
Jöreskog, K.G. (1969). Efficient estimation in image factor analysis. Psychometrika, 34, 183–202.CrossRefGoogle Scholar
Leighton, J.P. (2004). Avoiding misconceptions, misuse, and missed opportunities: The collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23, 1–10.Google Scholar
Leighton, J.P., Gierl, M.J., & Hunka, S. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205–237.CrossRefGoogle Scholar
Liu, J., Feigenbaum, M., & Walker, M.E. (2004, April). New SAT and new PSAT/NMSQT Spring 2003 field trial design. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego.Google Scholar
Longford, N.T. (1997). Shrinkage estimation of linear combinations of true scores. Psychometrika, 62, 237–244.CrossRefGoogle Scholar
Luecht, R.M. (1993, April). A marginal maximum likelihood approach to deriving multidimensional composite abilities under the generalized partial credit model. Paper presented at the annual meeting of the American Educational Research Association, Atlanta.Google Scholar
Luecht, R.M. (2001, April). Capturing, codifying and scoring complex data for innovative, computer-based items. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle.Google Scholar
Luecht, R.M. (2002, April). From design to delivery: Engineering the mass production of complex performance assessments. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.Google Scholar
Luecht, R.M. (2005a, April). Extracting multidimensional information from multiple-choice question distractors for diagnostic scoring. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Quebec, Canada.Google Scholar
Luecht, R.M. (2005b). Item analysis. In Everitt, B.S. & Howell, D.C. (Eds.), Encyclopedia of Statistics in Behavioral Science, London: Wiley.CrossRefGoogle Scholar
Luecht, R.M. (2006, February). Computer-based approaches to diagnostic assessment. Invited presentation at the annual meeting of the Association of Test Publishers, Orlando, FL.Google Scholar
Luecht, R.M., Gierl, M.J., Tan, X., & Huff, K. (2006, April). Scalability and the development of useful diagnostic scales. Paper presented at the annual meeting of the National Council on Measurement in Education, San Francisco.Google Scholar
Mislevy, R.J., & Riconscente, M.M. (2005, July). Evidence-centered assessment design: layers, structures, and terminology (PADI Technical Report 9). Menlo Park, CA: SRI International.Google Scholar
Nishisato, S. (1984). A simple application of a quantification method. Psychometrika, 49, 25–36.CrossRefGoogle Scholar
O'Callaghan, R.K., Morley, M.E., & Schwartz, A. (2004, April). Developing skill categories for the SAT math section. Paper presented at the annual meeting at the National Council on Measurement in Education, San Diego.Google Scholar
Wainer. H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., Nelson, L., Swygert, K.A., & Thissen, D. (2001). Augmented scores – “Borrowing strength” to compute scores based upon small numbers of items. In Wainer, H. & Thissen, D. (Eds.), Test scoring (pp. 343–387). Mahwah, NJ: Erlbaum.Google Scholar
VanderVeen, A. (2004, April). Toward a construct of critical reading for the new SAT. Paper presented at the annual meeting at the National Council on Measurement in Education, San Diego.Google Scholar
Vevea, J.L., Edwards, M.C., & Thissen, D. (2002). User's guide for AUGMENT v.2: Empirical Bayes subscore augmentation software. Chapel Hill, NC: L.L. Thurstone Psychometric Laboratory.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×