The Demand for Cognitive Diagnostic Assessment

Kristen Huff; Dean P. Goodman

doi:10.1017/CBO9780511611186.002

2 - The Demand for Cognitive Diagnostic Assessment

Published online by Cambridge University Press: 23 November 2009

Kristen Huff and

Dean P. Goodman

Edited by

Jacqueline Leighton and

Mark Gierl

Show author details

Kristen Huff: Affiliation:
Senior Director, K-12 Research & Psychometrics, The College Board, New York
Dean P. Goodman: Affiliation:
Assessment Consultant
Jacqueline Leighton: Affiliation:
University of Alberta
Mark Gierl: Affiliation:
University of Alberta

Book contents

Get access

Summary

In this chapter, we explore the nature of the demand for cognitive diagnostic assessment (CDA) in K–12 education and suggest that the demand originates from two sources: assessment developers who are arguing for radical shifts in the way assessments are designed, and the intended users of large-scale assessments who want more instructionally relevant results from these assessments. We first highlight various themes from the literature on CDA that illustrate the demand for CDA among assessment developers. We then outline current demands for diagnostic information from educators in the United States by reviewing results from a recent national survey we conducted on this topic. Finally, we discuss some ways that assessment developers have responded to these demands and outline some issues that, based on the demands discussed here, warrant further attention.

THE DEMAND FOR COGNITIVE DIAGNOSTIC ASSESSMENT FROM ASSESSMENT DEVELOPERS

To provide the context for assessment developers' call for a revision of contemporary assessment practices that, on the whole, do not operate within a cognitive framework, we offer a perspective on existing CDA literature, and we outline the differences between psychometric and cognitive approaches to assessment design. The phrases working within a cognitive framework, cognitively principled assessment design, and cognitive diagnostic assessment are used interchangeably throughout this chapter. They can be generally defined as the joint practice of using cognitive models of learning as the basis for principled assessment design and reporting assessment results with direct regard to informing learning and instruction.

Type: Chapter
Information: Cognitive Diagnostic Assessment for Education
Theory and Applications
, pp. 19 - 60

DOI: https://doi.org/10.1017/CBO9780511611186.002 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bennett, R. E. (1999). Using new technology to improve assessment. Educational Measurement: Issues and Practice, 18(3), 5–12.CrossRef Google Scholar

Bennett, R. E., & Bejar, , I., I. (1998). Validity and automated scoring: It's not only the scoring. Educational Measurement, 4, 9–17.CrossRef Google Scholar

Bennett, R. E., Steffen, M., Singley, M. K., Morley, M., & Jacquemin, D. (1997). Evaluating an automatically scorable, open-ended response type for measuring mathematical reasoning in computer-adaptive tests. Journal of Educational Measurement, 34(2), 62–176.CrossRef Google Scholar

Black, P., & Wiliam, D. (1998a). Assessment and classroom learning. Assessment in Education: Principles, Policy and Practice, 5(1), 7–74.CrossRef Google Scholar

Black, P., & Wiliam, D. (1998b). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 80(2), 139–148.Google Scholar

British Columbia (BC) Ministry of Education. (2005a). 2004/05 Service plan report. Retrieved June 26, 2006, from http://www.bcbudget.gov.bc.ca/Annual_Reports/2004_2005/educ/educ.pdf.

British Columbia (BC) Ministry of Education. (2005b). Handbook of procedures for the graduation program. Retrieved June 26, 2006, from http://www.bced./gov.bc.ca/exams/handbook/handbook_procedures.pdf.

British Columbia (BC) Ministry of Education. (2006). E-assessment: Grade 10 and 11 – Administration. Retrieved June 26, 2006, from http://www.bced.gov.bc.ca/eassessment/gradprog.htm.

Buck, G., VanEssen, T., Tatsuoka, K., Kostin, I., Lutz, D., & Phelps, M. (1998). Development, selection and validation of a set of cognitive and linguistic attributes for the SAT I verbal: Sentence completion section (ETS Research Report [RR-98–23]). Princeton, NJ: Educational Testing Service.Google Scholar

Chipman, S. F., Nichols, P. D., & Brennan, R. L. (1995). Introduction. In Nichols, P. D., Chipman, S. F., & Brennan, R. L. (Eds.), Cognitively diagnostic assessment (pp. 1–18). Mahwah, NJ: Erlbaum.Google Scholar

College Board. (2006). Passage-based reading. Retrieved January 15, 2006, from http://www.collegeboard.com/student/testing/sat/prep_one/passage_based/pracStart.html.

Cronbach, L. J. (1971). Test validation. In Thorndike, R. L. (Ed.), Educational measurement (3rd ed., pp. 443–507). Washington, D.C.: American Council on Education.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.CrossRef Google Scholar PubMed

DiBello, L. V. (2002, April). Skills-based scoring models for the PSAT/NMSQT. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.

DiBello, L. V., & Crone, C. (2001, April). Technical methods underlying the PSAT/NMSQT enhanced score report. Paper presented at the annual meeting of the National Council on Measurement in Education, Seattle.

DiBello, L. V., Stout, W. F., & Roussos, L. A. (1995). Unified cognitive/psychometric diagnostic assessment likelihood-based classification techniques. In Nichols, P. D., Chipman, S. F., & Brennan, R. L. (Eds.), Cognitively diagnostic assessment (pp. 361–390). Hillsdale, NJ: Erlbaum.Google Scholar

Embretson (Whitely), S. (1983). Construct validity: Construct representation versus nomothetic span. Psychological Bulletin, 93(1), 179–197.Google Scholar

Embretson, S. E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64(4), 407–433.CrossRef Google Scholar

Feltovich, P. J., Spiro, R. J., & Coulson, R. L. (1993). Learning, teaching, and testing for complex conceptual understanding. In Frederiksen, N., Mislevy, R. J., & Bejar, I. I. (Eds.), Test theory for a new generation of tests (pp. 181–218). Hillsdale, NJ: Erlbaum.Google Scholar

Fischer, G. H., & Formann, A. K. (1982). Some applications of logistic latent trait models with linear constraints on the parameters. Applied Psychological Measurement, 6(4), 397–416.CrossRef Google Scholar

Goodman, D. P., & Hambleton, , , R. K. (2004). Student test score reports and interpretive guides: Review of current practices and suggestions for future research. Applied Measurement in Education, 17(2), 145–220.CrossRef Google Scholar

Goodman, D. P., & Huff, K. (2006). Findings from a national survey of teachers on the demand for and use of diagnostic information from large-scale assessments. Manuscript in preparation, College Board, New York.Google Scholar

Gorin, J. (2005). Manipulating processing difficulty of reading comprehension questions: The feasibility of verbal item generations. Journal of Educational Measurement, 42, 351–373.CrossRef Google Scholar

Hambleton, R. K., & Slater, S. (1997). Are NAEP executive summary reports understandable to policy makers and educators? (CSE Technical Report 430). Los Angeles: National Center for Research on Evaluation, Standards, and Student Teaching.Google Scholar

Harcourt Assessment. (2006a). Stanford achievement test series, tenth edition: Critical, action-oriented information. Retrieved January 29, 2006, from http://harcourtassessment.com/haiweb/Cultures/en-US/dotCom/Stanford10.com/Subpages/Stanford+10+-+Sample+Reports.htm.

Harcourt Assessment. (2006b). Support materials for parents, students, and educators. Retrieved January 29, 2006, from http://harcourtassessment.com/haiweb/Cultures/en-US/dotCom/Stanford10.com/Subpages/Stanford+10+-+Support+Materials.htm.

Huff, K. (2004, April). A practical application of evidence centered design principles: Coding items for skills. In K. Huff (Organizer), Connecting curriculum and assessment through meaningful score reports. Symposium conducted at the meeting of the National Council on Measurement in Education, San Diego.

Huff, K. L., & Sireci, S. G. (2001). Validity issues in computer-based testing. Educational Measurement: Issues and Practice, 20(4), 16–25.CrossRef Google Scholar

Impara, J. C., Divine, K. P., Bruce, F. A., Liverman, M. R., & Gay, A. (1991). Does interpretive test score information help teachers?Educational Measurement: Issues and Practice, 10(4), 16–18.CrossRef Google Scholar

Jaeger, R. (1998). Reporting the results of the National Assessment of Educational Progress (NVS NAEP Validity Studies). Washington, DC: American Institutes for Research.CrossRef Google Scholar

Kintsch, W. (1998). Comprehension: A paradigm for cognition.Cambridge, MA: Cambridge University Press.Google Scholar

Leighton, J. P., Gierl, M. J., & Hunka, S. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205–237.CrossRef Google Scholar

Luecht, R. M. (2002, April). From design to delivery: Engineering the mass production of complex performance assessments. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans.

Massachusetts Department of Education. (2004). 2004 MCAS technical report. Retrieved January 15, 2006, from http://www.doe.mass.edu/mcas/2005/news/04techrpt.doc#_Toc123531775.

Massachusetts Department of Education. (2005). The Massachusetts comprehensive assessment system: Guide to the 2005 MCAS for parents/guardians. Malden: Author.

Mathematical Sciences Education Board. (1993). Measuring what counts: A conceptual guide for mathematics assessment.Washington, DC: National Academy Press.

Messick, S. (1989). Validity. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.Google Scholar

Mislevy, R. J. (1996). Test theory reconceived. Journal of Educational Measurement, 33(4), 379–416.CrossRef Google Scholar

Mislevy, R. J., & Riconscente, M. M. (2005). Evidence-centered assessment design: Layers, structures, and terminology (PADI Technical Report 9). Menlo Park, CA: SRI International and University of Maryland. Retrieved May 1, 2006, from http://padi.sri.com/downloads/TR9_ECD.pdf.

Missouri Department of Elementary and Secondary Education. (2005). Missouri assessment program: Guide to interpreting results. Retrieved June 24, 2006, from http://dese.mo.gov/divimprove/assess/GIR_2005.pdf.

National Research Council (NRC). (1999). How people learn: Brain, mind, experience, and school.Washington, DC: National Academy Press.

National Research Council (NRC). (2001). Knowing what students know: The science and design of educational assessment.Washington, DC: National Academy Press.

National Research Council (NRC). (2002). Learning and understanding: Improving advanced study of mathematics and science in U.S. high schools.Washington, DC: National Academy Press.

New Jersey Department of Education. (2006). Directory of test specifications and sample items for ESPA, GEPA and HSPA in language arts literacy. Retrieved June 24, 2006, from http://www.njpep.org/assessment/TestSpecs/LangArts/TOC.html.

Nichols, P. D. (1993). A framework for developing assessments that aid instructional decisions (ACT Research Report 93–1). Iowa City, IA:American College Testing.Google Scholar

Nichols, P. D. (1994). A framework for developing cognitively diagnostic assessments. Review of Educational Research, 64, 575–603.CrossRef Google Scholar

Nitko, A. J. (1989). Designing tests that are integrated with instruction. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 447–474). New York: American Council on Education/Macmillan.Google Scholar

No Child Left Behind (NCLB) Act of 2001, Pub. L. No. 107-110, § 1111, 115 Stat. 1449–1452 (2002).

Notar, C. E., Zuelke, D. C., Wilson, J. D., & Yunker, B. D. (2004). The table of specifications: Insuring accountability in teacher made tests. Journal of Instructional Psychology, 31(2), 115–129.Google Scholar

O'Callaghan, R., Morley, M., & Schwartz, A. (2004, April). Developing skill categories for the SAT® math section. In K. Huff (Organizer), Connecting curriculum and assessment through meaningful score reports. Symposium conducted at the meeting of the National Council on Measurement in Education, San Diego.

Ohio Department of Education. (2005). Diagnostic guidelines. Retrieved February 1, 2006, from http://www.ode.state.oh.us/proficiency/diagnostic_achievement/Diagnostics_PDFs/Diagnostic_Guidelines_9--05.pdf.

O'Neil, T., Sireci, , Huff, S. G., , K. L. (2004). Evaluating the content validity of a state-mandated science assessment across two successive administrations of a state-mandated science assessment. Educational Assessment and Evaluation, 9(3–4), 129–151.CrossRef Google Scholar

Pellegrino, J. W. (2002). Understanding how students learn and inferring what they know: Implications for the design of curriculum, instruction, and assessment. In Smith, M. J. (Ed.), NSF K–12 Mathematics and science curriculum and implementation centers conference proceedings (pp. 76–92). Washington, DC: National Science Foundation and American Geological Institute.Google Scholar

Pellegrino, J. W., Baxter, G. P., & Glaser, R. (1999). Addressing the “two disciplines” problem: Linking theories of cognition and learning with assessment and instructional practice. Review of Research in Education, 24, 307–353.Google Scholar

Perfetti, C. A. (1985). Reading ability. In Sternberg, R. J. (Ed.), Human abilities: An information-processing approach (pp. 31–58). New York: W. H. Freeman.Google Scholar

Perfetti, C. A. (1986). Reading ability.New York: Oxford University Press.Google Scholar

Riconscente, M. M., Mislevy, R. J., & Hamel, L. (2005). An introduction to PADI task templates (PADI Technical Report 3). Menlo Park, CA: SRI International and University of Maryland. Retrieved May 1, 2006, from http://padi.sri.com/downloads/TR3_Templates.pdf.

Sheehan, K. M. (1997). A tree-based approach to proficiency scaling and diagnostic assessment. Journal of Educational Measurement, 34(4), 333–352.CrossRef Google Scholar

Sheehan, K. M., Ginther, , Schedl, A., M., (1999). Development of a proficiency scale for the TOEFL reading comprehension section (Unpublished ETS Research Report). Princeton, NJ: Educational Testing Service.Google Scholar

Sireci, S. G., & Zenisky, A. L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In Downing, S. M. & Haladyna, T. M. (Eds.), Handbook of test development (pp. 329–348). Mahwah, NJ: Erlbaum.Google Scholar

Snow, R. E., & Lohman, D. F. (1989). Implications of cognitive psychology for educational measurement. In Linn, R. L. (Ed.), Educational measurement (3rd ed., pp. 263–331). New York: American Council on Education/Macmillan.Google Scholar

Steinberg, L. S., Mislevy, R. J., Almond, R. G., Baird, A. B., Cahallan, C., DiBello, L. V., Senturk, D., Yan, D., Chernick, H., & Kindfield, A. C. H. (2003). Introduction to the Biomass project: An illustration of evidence-centered assessment design and delivery capability (CRESST Technical Report 609). Los Angeles: Center for the Study of Evaluation, CRESST, UCLA.Google Scholar

Stiggins, R. (2001). The unfulfilled promise of classroom assessment. Educational Measurement: Issues and Practice, 20(3), 5–15.CrossRef Google Scholar

Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345–354.CrossRef Google Scholar

Tatsuoka, K. K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. In Nichols, P. D., Chipman, S. F., & Brennan, R. L. (Eds.), Cognitively diagnostic assessment (pp. 327–359). Hillsdale, NJ: Erlbaum.Google Scholar

Thissen, D., & Edwards, M. C. (2005, April). Diagnostic scores augmented using multidimensional item response theory: Preliminary investigation of MCMC strategies. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal.

VanderVeen, A. (2004, April). Toward a construct of critical reading for the new SAT. In K. Huff (Organizer), Connecting curriculum and assessment through meaningful score reports. Symposium conducted at the meeting of the National Council on Measurement in Education, San Diego.

VanderVeen, A., Huff, K., Gierl, M., McNamara, D. S., Louwerse, M., & Graesser, A. (2007). Developing and validating instructionally relevant reading competency profiles measured by the critical reading section of the SAT. In McNamara, D. S. (Ed.), Reading comprehension strategies: Theory, interventions, and technologies. Mahwah, NJ: Erlbaum.Google Scholar

Wainer, H. (1997). Improving tabular displays: With NAEP tables as examples and inspirations. Journal of Educational and Behavioral Statistics, 22(1), 1–30.CrossRef Google Scholar

Wainer, H., Hambleton, R. K., & Meara, K. (1999). Alternative displays for communicating NAEP results: A redesign and validity study. Journal of Educational Measurement, 36(4), 301–335.CrossRef Google Scholar

Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L., Swygert, K. A., & Thissen, D. (2001). Augmented scores: “Borrowing strength” to compute scores based on small numbers of items. In Thissen, D. & Wainer, H. (Eds.), Test scoring (pp. 343–387). Hillsdale, NJ: Erlbaum.Google Scholar

Washington State Office of Superintendent of Public Instruction. (2006). Test and item specifications for grades 3–high school reading WASL. Retrieved June 24, 2006, from http://www.k12.wa.us/Assessment/WASL/Readingtestspecs/TestandItemSpecsv2006.pdf.

Book contents

2 - The Demand for Cognitive Diagnostic Assessment

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive