Skip to main content Accessibility help

Item response theory and the measurement of psychiatric constructs: some empirical and conceptual issues and challenges

  • S. P. Reise (a1) and A. Rodriguez (a1)


Item response theory (IRT) measurement models are now commonly used in educational, psychological, and health-outcomes measurement, but their impact in the evaluation of measures of psychiatric constructs remains limited. Herein we present two, somewhat contradictory, theses. The first is that, when skillfully applied, IRT has much to offer psychiatric measurement in terms of scale development, psychometric analysis, and scoring. The second argument, however, is that psychiatric measurement presents some unique challenges to the application of IRT – challenges that may not be easily addressed by application of conventional IRT models and methods. These challenges include, but are not limited to, the modeling of conceptually narrow constructs and their associated limited item pools, and unipolar constructs where the expected latent trait distribution is highly skewed.


Corresponding author

*Address for correspondence: S. P. Reise, Ph.D., Department of Psychology, UCLA, Franz Hall, Los Angeles, CA 90095, USA. (Email:


Hide All
Bollen, K, Lennox, R (1991). Conventional wisdom on measurement: a structural equation perspective. Psychological Bulletin 110, 305314.
Cai, L (2013). flexMIRT: a Numerical Engine for Flexible Multilevel Multidimensional Item Analysis and Test Scoring (Version 2.0) (Computer software) . Vector Psychometric Group: Chapel Hill, NC.
Cai, L, Thissen, D, du Toit, SHC (2011). IRTPRO for Windows (Computer software) . Scientific Software International: Lincolnwood, IL.
Cella, D, Riley, W, Stone, A, Rothrock, N, Reeve, B, Yount, S, Amtmann, D, Bode, R, Buysse, D, Choi, S, Cook, K, Devellis, R, DeWalt, D, Fries, JF, Gershon, R, Hahn, EA, Lai, JS, Pilkonis, P, Revicki, D, Rose, M, Weinfurt, K, Hays, R (2010). The Patient-Reported Outcomes Measurement Information System (PROMIS) developed and tested its first wave of adult self-reported health outcome item banks: 2005–2008. Journal of Clinical Epidemiology 63, 11791194.
Chalmers, RP, Pritikin, J, Robitzsch, A, Zoltak, M (2015). mirt: a multidimensional item response theory package for the R environment. Journal of Statistical Software 48, 129.
Chen, WH, Thissen, D (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics 22, 265289.
Choi, SW (2009). Firestar: Computerized Adaptive Testing simulation program for polytomous item response theory models. Applied Psychological Measurement 33, 644645.
Choi, SW, Schalet, B, Cook, KF, Cella, D (2014). Establishing a common metric for depressive symptoms: linking the BDI-II, CES-D, and PHQ-9 to PROMIS Depression. Psychological Assessment 26, 513527.
Embretson, SE, Reise, SP (2000). Item Response Theory for Psychologists, Mahwah, NJ: Erlbaum.
Eysenck, SB, Eysenck, HJ (1978). Impulsiveness and Venturesomeness: their position in a dimensional system of personality description. Psychological Reports 43, 12471255.
Fayers, PM, Hand, DJ (2002). Causal variables, indicator variables and measurement scales: an example from quality of life. Journal of the Royal Statistical Society: Series A (Statistics in Society) 165, 233253.
Gershon, RC, Wagster, MV, Hendrie, HC, Fox, NA, Cook, KF, Nowinski, CJ (2013). NIH Toolbox for assessment of neurological and behavioral function. Neurology 80, S2S6.
Han, KCT, Paek, I (2014). A review of commercial software packages for multidimensional IRT modeling. Applied Psychological Measurement 13, 113.
Horn, NR, Dolan, M, Elliott, R, Deakin, JFW, Woodruff, PWR (2003). Response inhibition and impulsivity: an fMRI study. Neuropsychologia 41, 19591966.
Lucke, JF (2014). Positive trait item response models. In New Developments in Quantitative Psychology: Presentations from the 77th Annual Psychometric Society Meeting (ed. Millsap, R. E. van der Ark, L. A., Bolt, D. M. and Woods, C. M.), pp. 199213. New York: Springer.
Lucke, JF (2015). Unipolar item response models. In Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment (ed. Reise, S. P. and Revicki, D.), pp. 272284. Routledge: New York.
Maydeu-Olivares, A, Cai, L, Hernández, A (2011). Comparing the fit of item response theory and factor analysis models. Structural Equation Modeling: a Multidisciplinary Journal 18, 333356.
Meijer, RR, Tendeiro, JN, Wanders, RBK (2015). The use of nonparametric item response theory to explore data quality. In Handbook of Item response Theory Modeling: Applications to Typical Performance Assessment (ed. Reise, S. P. and Revicki, D. A.), pp. 85110. Routledge: London, England.
Molenaar, D (2014). Heteroscedastic latent trait models for dichotomous data. Psychometrika 625644.
Monroe, S, Cai, L (2014). Estimation of a Ramsay-curve item response model by the Metropolis-Hastings Robbins-Monro algorithm. Educational and Psychological Measurement 74, 343369.
Mungas, D, Reed, BR, Marshall, SC, González, HM (2000). Development of psychometrically matched English and Spanish language neuropsychological tests for older persons. Neuropsychology 14, 209223.
Muthén, LK, Muthén, BO (2012). Mplus. The Comprehensive Modelling Program for Applied Researchers: User's Guide, 5.
Orlando, M, Thissen, D (2000). Likelihood-based item-fit indices for dichotomous item response theory models. Applied Psychological Measurement 24, 5064.
Orlando, M, Thissen, D (2003). Further investigation of the performance of S-X2: an item fit index for use with dichotomous item response theory models. Applied Psychological Measurement 27, 289298.
Partchev, I (2015). Package ‘irtoys’: simple interface to the estimation and plotting of IRT models (
R Development Core Team (2015). R: a Language and Environment for Statistical Computing. R Foundation for Statistical Computing: Vienna, Austria (
Reckase, M (2009). Multidimensional Item Response Theory. Springer: New York.
Reise, SP, Revicki, D (2015). Handbook of Item Response Theory Modeling: Applications to Typical Performance Assessment. Routledge: New York.
Reise, SP, Waller, NG (2003). How many IRT parameters does it take to model psychopathology items? Psychological Methods 8, 164184.
Reise, SP, Waller, NG (2009). Item response theory and clinical measurement. Annual Review of Clinical Psychology 5, 2748.
Rizopoulos, D (2015) ltm: latent trait models under IRT (
Schalet, BD, Cook, KF, Choi, SW, Cella, D (2014). Establishing a common metric for self-reported anxiety: linking the MASQ, PANAS, and GAD-7 to PROMIS Anxiety. Journal of Anxiety Disorders 28, 8896.
Steinberg, L, Thissen, D (1996). Uses of item response theory and the testlet concept in the measurement of psychopathology. Psychological Methods 1, 181197.
Tavares, HR, Andrade, DFD, Pereira, CADB (2004). Detection of determinant genes and diagnostic via item response theory. Genetics and Molecular Biology 27, 679685.
Thomas, ML (2011). The value of item response theory in clinical assessment: a review. Assessment 18, 291307.
Van der Ark, LA (2014). New developments in Mokken scale analysis in R. Journal of Statistical Software 48, 127.
Wall, MM, Park, JY, Moustaki, I (2015). IRT modeling in the presence of zero-inflation with application to psychiatric disorder severity. Applied Psychological Measurement 39, 583597.
Woods, CM (2006). Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychological Methods 11, 253270.
Wu, EJ, Bentler, PM (2011). EQSIRT: a User-friendly IRT Program (computer software) . Multivariate Software: Encino, CA.
Xu, MK, Gaysina, D, Barnett, JH, Scoriels, L, van de Lagemaat, LN, Wong, A, Jones, PB (2015). Psychometric precision in phenotype definition is a useful step in molecular genetic investigation of psychiatric disorders. Translational Psychiatry 5, e593.
Yang, FM, Kao, ST (2014). Item response theory for measurement validity. Shanghai Archives of Psychiatry 26, 171177.



Altmetric attention score

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed