Test Construction and Diagnostic Testing

Joanna S. Gorin

doi:10.1017/CBO9780511611186.007

7 - Test Construction and Diagnostic Testing

Published online by Cambridge University Press: 23 November 2009

Joanna S. Gorin

Edited by

Jacqueline Leighton and

Mark Gierl

Show author details

Joanna S. Gorin: Affiliation:
Assistant Professor of Psychology in Education, Arizona State University
Jacqueline Leighton: Affiliation:
University of Alberta
Mark Gierl: Affiliation:
University of Alberta

Book contents

Get access

Summary

Among the many test uses listed in the Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) diagnosis is perhaps the most complex. Assessment for diagnosis transforms quantitative data into rich qualitative descriptions of individuals' cognitive abilities, psychological pathologies, and personalities. The use of diagnostic tests has historically been applied to psychological assessment for psychiatric and neuropsychiatry diagnosis, with fewer examples of educational tests designed for this purpose. More commonly, educational testing has focused on purposes such as rating, selection, placement, competency, and outcome evaluation. Consequently, the test development procedures included in the majority of the educational assessment literature pertain to test construction for these purposes. Recently, however, educators have recognized educational assessments as missed opportunities to inform educational decisions. Nowhere is this realization more evident than in the No Child Left Behind (NCLB) Act of 2001 in the United States, specifically as it pertains to the development and use of yearly standardized achievement tests.

Such assessments shall produce individual student interpretive, descriptive, and diagnostic reports … that allow parents, teachers, and principals to understand and address the specific academic needs of students, and include information regarding achievement on academic assessments aligned with State academic achievement standards, and that are provided to parents, teachers, and principals as soon as is practicably possible after the assessment is given, in an understandable and uniform format, and to the extent practicable, in a language that parents can understand. (NCLB, Part A, Subpart 1, Sec. 2221[b]3[C][xii], 2001)

Type: Chapter
Information: Cognitive Diagnostic Assessment for Education
Theory and Applications
, pp. 173 - 202

DOI: https://doi.org/10.1017/CBO9780511611186.007 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

American Educational Research Association (AERA), American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: AERA.

Behrens, J.T., Mislevy, R.J., Bauer, M., Williamson, D.M., & Levy, R. (2004). Introduction to evidence-centered design and lessons learned from its application in a global e-learning program. International Journal of Testing, 4, 295–302.CrossRef Google Scholar

Bennett, R.E., Jenkins, F., Persky, H., & Weiss, A. (2003). Assessing complex problem solving performances. Assessment in Education: Principles, Policy, & Practice, 10(3), 347–359.CrossRef Google Scholar

Briggs, D.C., Alonzo, A.C., Schwab, C., & Wilson, M. (2006). Diagnostic assessment with ordered multiple-choice items. Educational Assessment, 11(1), 33–63.CrossRef Google Scholar

Champagne, A., Bergin, K., Bybee, R., Duschl, R., & Ghallager, J. (2004). NAEP 2009 science framework development: Issues and recommendations. Washington, DC: National Assessment Governing Board.Google Scholar

Cross, D.R., & Paris, S.G. (1987). Assessment of reading comprehension: Matching test purposes and test properties. Educational Psychologist, 22(3/4), 313–332.CrossRef Google Scholar

DeVellis, R.F. (1991). Scale development: Theory and applications. Thousand Oaks, CA: Sage.Google Scholar

Diehl, K.A. (2004). Algorithmic item generation and problem solving strategies in matrix completion problems. Dissertation Abstracts International: Section B: The Sciences and Engineering, 64(8-B), 4075.Google Scholar

Embretson, S.E. (1994). Application of cognitive design systems to test development. In Reynolds, C.R. (Ed.), Cognitive assessment: A multidisciplinary perspective (pp. 107–135). New York: Plenum Press.CrossRef Google Scholar

Embretson, S.E. (1998). A cognitive design system approach to generating valid tests: Application to abstract reasoning. Psychological Methods, 3, 300–396.CrossRef Google Scholar

Embretson, S.E. (1999). Generating items during testing: Psychometric issues and models. Psychometrika, 64(4), 407–433.CrossRef Google Scholar

Embretson, S.E., & Gorin, J.S. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38(4), 343–368.CrossRef Google Scholar

Ericsson, K.A., & Simon, H.A. (1993). Protocol analysis: Verbal reports as data (Rev. ed.). Cambridge, MA: MIT Press.Google Scholar

Frederiksen, N. (1990). Introduction. In Frederiksen, N., Glaser, R., Lesgold, A., & Shafto, M.G. (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. ix–xvii). Hillsdale, NJ: Erlbaum.Google Scholar

Gertner, A., & VanLehn, K. (2000). Andes: A coached problem solving environment for physics. In Gauthier, G., Frasson, C., & VanLehn, K. (Eds.), Intelligent tutoring systems: 5th International conference, ITS 2000 (pp. 131–142). Berlin: Springer.CrossRef Google Scholar

Gorin, J.S. (2006). Using alternative data sources to inform item difficulty modeling. Paper presented at the 2006 annual meeting of the National Council on Educational Measurement. San Francisco, CA.Google Scholar

Hartz, S.M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Dissertation Abstracts International: Section B: The Sciences and Engineering, 63(2-B), 864.Google Scholar

Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29(4), 262–277.CrossRef Google Scholar

Leighton, J.P. (2004). Avoiding misconception, misuse, and missed opportunities: The collection of verbal reports in educational achievement testing. Educational Measurement: Issues and Practice, 23(4), 6–15.CrossRef Google Scholar

Leighton, J.P., Gierl, M.J., & Hunka, S.M. (2004). The attribute hierarchy method for cognitive assessment: A variation on Tatsuoka's rule-space approach. Journal of Educational Measurement, 41, 205–237.CrossRef Google Scholar

Marshall, S.P. (1981). Sequential item selection: Optimal and heuristic policies. Journal of Mathematical Psychology, 23, 134–152.CrossRef Google Scholar

Marshall, S.P. (1990). Generating good items for diagnostic tests. In Frederiksen, N., Glaser, R., Lesgold, A., & Shafto, M. (Eds.), Diagnostic monitoring of skill and knowledge acquisition (pp. 433–452). Hillsdale, NJ: Erlbaum.Google Scholar

Martin, J., & VanLehn, K. (1995). A Bayesian approach to cognitive assessment. In Nichols, P.D., Chipman, S.F., & Brennan, R.L. (Eds.), Cognitively diagnostic assessment (pp. 141–167). Hillsdale, NJ: Erlbaum.Google Scholar

Messick, S. (1989). Validity. In Linn, R.L. (Ed.), Educational measurement (3rd ed.; pp. 13–103). New York: American Council on Education/Macmillan.Google Scholar

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.CrossRef Google Scholar

Mislevy, R.J. (1994). Evidence and inference in educational assessment. Psychometrika, 59, 439–483.CrossRef Google Scholar

Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (2002). Design and analysis in task-based language assessment. Language Testing. Special Issue: Interpretations, intended uses, and designs in task-based language, 19(4), 477–496.CrossRef Google Scholar

Netemeyer, R.G., Bearden, W.O., & Sharma, S. (2003). Scaling procedures: Issues and applications. Thousand Oaks, CA: Sage.CrossRef Google Scholar

No Child Left Behind Act of 2001, H.R. 1 Cong. (2001).

Pek, P.K., & Poh, K.L. (2004). A Bayesian tutoring system for Newtonian mechanics: Can it adapt to different learners?Journal of Educational Computing Research, 31(3), 281–307.CrossRef Google Scholar

Rayner, K. (1998). Eye movements in reading and information processing: 20 Years of research. Psychological Bulletin, 124, 372–422.CrossRef Google Scholar

Rayner, K., Warren, T., Juhasz, B.J., & Liversedge, S.P. (2004). The effect of plausibility on eye movements in reading. Journal of Experimental Psychology: Learning, Memory, & Cognition, 30, 1290–1301.Google Scholar PubMed

Sireci, S.G., & Zenisky, A.L. (2006). Innovative item formats in computer-based testing: In pursuit of improved construct representation. In Downing, S.M. & Haladyna, T.M. (Eds.), Handbook of test development (pp. 329–357). Mahwah, NJ: Erlbaum.Google Scholar

Tatsuoka, K.K. (1985). A probabilistic model for diagnosing misconceptions by the pattern classification approach. Journal of Educational Statistics, 10(1), 55–73.CrossRef Google Scholar

Tatsuoka, K.K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. In Nichols, P.D., Chipman, S.F., & Brennan, R.L. (Eds.), Cognitively diagnostic assessment (pp. 327–361). Hillsdale, NJ: Erlbaum.Google Scholar

Tatsuoka, K.K., Corter, J.E., & Tatsuoka, C. (2004). Patterns of diagnosed mathematical content and process skills in TIMSS-R across a sample of 20 countries. American Educational Research Journal, 41(4), 901–926.CrossRef Google Scholar

Underwood, G., Jebbett, L., & Roberts, K. (2004). Inspecting pictures for information to verify a sentence: Eye movements in general encoding and in focused search. Quarterly Journal of Experimental Psychology A: Human Experimental Psychology, 57A, 165–182.CrossRef Google Scholar

U.S. Department of Education. (2003). NAEP validity studies: Implications of electronic technology for the NAEP assessment (Working Paper No. 2003–16). Washington, DC: Institute of Education Sciences.

VanLehn, K. (2001). Olae: A bayesian performance assessment for complex problem solving. Paper presented at the 2001 annual meeting of the National Council on Measurement in Education, Seattle.Google Scholar

Williamson, D.M., Bauer, M., Steinberg, L.S., Mislevy, R.J., Behrens, J.T., & DeMark, S.F. (2004). Design rationale for a complex performance assessment. International Journal of Testing, 4(4), 303–332.CrossRef Google Scholar

Wilson, M. (2005). Constructing measures: An item response modeling approach. Mahwah, NJ: Erlbaum.Google Scholar

Wilson, M., & Sloane, K. (2000). From principles to practice: An embedded assessment system. Applied Measurement in Education, 13, 181–208.CrossRef Google Scholar

Book contents

7 - Test Construction and Diagnostic Testing

Summary

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive