Skip to main content Accessibility help

Psychometric Properties of the NIH Toolbox Cognition Battery in Healthy Older Adults: Reliability, Validity, and Agreement with Standard Neuropsychological Tests

  • Emmi P. Scott (a1), Anne Sorrell (a2) and Andreana Benitez (a1)



Few independent studies have examined the psychometric properties of the NIH Toolbox Cognition Battery (NIHTB-CB) in older adults, despite growing interest in its use for clinical purposes. In this paper we report the test–retest reliability and construct validity of the NIHTB-CB, as well as its agreement or concordance with traditional neuropsychological tests of the same construct to determine whether tests could be used interchangeably.


Sixty-one cognitively healthy adults ages 60–80 completed “gold standard” (GS) neuropsychological tests, NIHTB-CB, and brain MRI. Test–retest reliability, convergent/discriminant validity, and agreement statistics were calculated using Pearson’s correlations, concordance correlation coefficients (CCC), and root mean square deviations.


Test–retest reliability was acceptable (CCC = .73 Fluid; CCC = .85 Crystallized). The NIHTB-CB Fluid Composite correlated significantly with cerebral volumes (r’s = |.35−.41|), and both composites correlated highly with their respective GS composites (r’s = .58−.84), although this was more variable for individual tests. Absolute agreement was generally lower (CCC = .55 Fluid; CCC = .70 Crystallized) due to lower precision in fluid scores and systematic overestimation of crystallized composite scores on the NIHTB-CB.


These results support the reliability and validity of the NIHTB-CB in healthy older adults and suggest that the fluid composite tests are at least as sensitive as standard neuropsychological tests to medial temporal atrophy and ventricular expansion. However, the NIHTB-CB may generate different estimates of performance and should not be treated as interchangeable with established neuropsychological tests.


Corresponding author

*Correspondence and reprint requests to: Emmi P. Scott, Department of Neurology, Medical University of South Carolina, 96 Jonathan Lucas Street MSC 323, Charleston, SC 29425, USA. E-mail:


Hide All
Altman, D.G. (1991). Practical Statistics for Medical Research. London: Chapman and Hall.
Altman, D.G. & Bland, J.M. (1983). Measurement in medicine: The analysis of method comparison studies. Journal of the Royal Statistical Society. Series D (The Statistician), 32(3), 307317. doi: 10.2307/2987937
Azab, M., Carone, M., Ying, S.H., & Yousem, D.M. (2015). Mesial temporal sclerosis: Accuracy of NeuroQuant versus neuroradiologist. AJNR. American Journal of Neuroradiology, 36(8), 14001406. doi: 10.3174/ajnr.A4313
Barchard, K.A. (2012). Examining the reliability of interval level data using root mean square differences and concordance correlation coefficients. Psychological Methods, 17(2), 294308. doi: 10.1037/a0023351
Barnhart, H.X., Haber, M.J., & Lin, L.I. (2007). An overview on assessing agreement with continuous measurements. Journal of Biopharmaceutical Statistics, 17(4), 529569. doi: 10.1080/10543400701376480
Berg, J.-L., Durant, J., Banks, S.J., & Miller, J.B. (2016). Estimates of premorbid ability in a neurodegenerative disease clinic population: comparing the Test of Premorbid Functioning and the Wide Range Achievement Test, 4th Edition. The Clinical Neuropsychologist, 30(4), 547557. doi: 10.1080/13854046.2016.1186224
Bondi, M.W., Edmonds, E.C., Jak, A.J., Clark, L.R., Delano-Wood, L., McDonald, C.R., … Salmon, D.P. (2014). Neuropsychological criteria for mild cognitive impairment improves diagnostic precision, biomarker associations, and progression rates. Journal of Alzheimer’s Disease: JAD, 42(1), 275289. doi: 10.3233/JAD-140276
Buckley, R.F., Sparks, K.P., Papp, K.V., Dekhtyar, M., Martin, C., Burnham, S., … Rentz, D.M. (2017). Computerized cognitive testing for use in clinical trials: A comparison of the NIH Toolbox and cogstate C3 batteries. The Journal of Prevention of Alzheimer’s Disease, 4(1), 311. doi: 10.14283/jpad.2017.1
Carlozzi, N.E., Tulsky, D.S., Wolf, T.J., Goodnight, S., Heaton, R.K., Casaletto, K.B., … Heinemann, A.W. (2017). Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabilitation Psychology, 62(4), 443454. doi: 10.1037/rep0000195
Carrasco, J.L. & Jover, L. (2003). Estimating the generalized concordance correlation coefficient through variance components. Biometrics, 59(4), 849858.
Casaletto, K.B., Umlauf, A., Beaumont, J., Gershon, R., Slotkin, J., Akshoomoff, N. & Heaton, R.K. (2015). Demographically corrected normative standards for the English version of the NIH toolbox cognition battery. Journal of the International Neuropsychological Society: JINS, 21(5), 378391. doi: 10.1017/S1355617715000351
Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155159.
Dodge, H.H., Zhu, J., Harvey, D., Saito, N., Silbert, L.C., Kaye, J.A., … Albin, R.L. (2014). Biomarker progressions explain higher variability in stage-specific cognitive decline than baseline values in Alzheimer disease. Alzheimer’s & Dementia, 10(6), 690703. doi: 10.1016/j.jalz.2014.04.513
Gershon, R.C., Cook, K.F., Mungas, D., Manly, J.J., Slotkin, J., Beaumont, J.L., & Weintraub, S. (2014). Language measures of the NIH Toolbox cognition battery. Journal of the International Neuropsychological Society: JINS, 20(6), 642651. doi: 10.1017/S1355617714000411
Gershon, R.C., Wagster, M.V., Hendrie, H.C., Fox, N.A., Cook, K.F., & Nowinski, C.J. (2013). NIH Toolbox for assessment of neurological and behavioral function. Neurology, 80(11, Suppl. 3), S2S6. doi: 10.1212/WNL.0b013e3182872e5f
Golden, C.J. (1978). Stroop color and word test. A manual for clinical and experimental uses. Chicago: Stoelting.
Hackett, K., Krikorian, R., Giovannetti, T., Melendez-Cabrero, J., Rahman, A., Caesar, E.E., … Isaacson, R.S. (2018). Utility of the NIH Toolbox for assessment of prodromal Alzheimer’s disease and dementia. Alzheimer’s & Dementia: Diagnosis, Assessment & Disease Monitoring, 10, 764772. doi: 10.1016/j.dadm.2018.10.002
Heaton, R.K., Akshoomoff, N., Tulsky, D., Mungas, D., Weintraub, S., Dikmen, S., … Gershon, R. (2014). Reliability and validity of composite scores from the NIH Toolbox Cognition Battery in adults. Journal of the International Neuropsychological Society: JINS, 20(6), 588598. doi: 10.1017/S1355617714000241
Heister, D., Brewer, J.B., Magda, S., Blennow, K., & McEvoy, L.K. (2011). Predicting MCI outcome with clinically available MRI and CSF biomarkers. Neurology, 77(17), 16191628. doi: 10.1212/WNL.0b013e3182343314
Holdnack, J.A., Tulsky, D.S., Slotkin, J., Tyner, C.E., Gershon, R., Iverson, G.L., & Heinemann, A.W. (2017). NIH Toolbox premorbid ability adjustments: Application in a traumatic brain injury sample. Rehabilitation Psychology, 62(4), 496508. doi: 10.1037/rep0000198
Ivnik, R.J., Malec, J.F., Smith, G.E., Tangalos, E.G., & Petersen, R.C. (1996). Neuropsychological tests’ norms above age 55: COWAT, BNT, MAE token, WRAT-R reading, AMNART, STROOP, TMT, and JLO. The Clinical Neuropsychologist, 10(3), 262278. doi: 10.1080/13854049608406689
Lang, S., Cadeaux, M., Opoku-Darko, M., Gaxiola-Valdez, I., Partlo, L.A., Goodyear, B.G., … Kelly, J. (2017). Assessment of cognitive, emotional, and motor domains in patients with diffuse gliomas using the national institutes of health toolbox battery. World Neurosurgery, 99, 448456. doi: 10.1016/j.wneu.2016.12.061
Lin, L.I. (1989). A concordance correlation coefficient to evaluate reproducibility. Biometrics, 45(1), 255. doi:10.2307/2532051
Loring, D.W., Bowden, S.C., Staikova, E., Bishop, J.A., Drane, D.L., & Goldstein, F.C. (2019). NIH toolbox picture sequence memory test for assessing clinical memory function: Diagnostic relationship to the rey auditory verbal learning test. Archives of Clinical Neuropsychology, 34(2), 268276. doi: 10.1093/arclin/acy028
Lucas, J.A., Ivnik, R.J., Willis, F.B., Ferman, T.J., Smith, G.E., Parfitt, F.C., … Graff-Radford, N.R. (2005). Mayo’s Older African Americans Normative Studies: Normative data for commonly used clinical neuropsychological measures. The Clinical Neuropsychologist, 19(2), 162183. doi: 10.1080/13854040590945265
Mathews, M., Abner, E., Kryscio, R., Jicha, G., Cooper, G., Smith, C., … Schmitt, F. A. (2014). Diagnostic accuracy and practice effects in the National Alzheimer’s Coordinating Center Uniform Data Set neuropsychological battery. Alzheimer’s & Dementia, 10(6), 675683. doi: 10.1016/j.jalz.2013.11.007
McGraw, K.O. & Wong, S.P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1(1), 3046. doi: 10.1037/1082-989X.1.1.30
Mungas, D., Heaton, R., Tulsky, D., Zelazo, P.D., Slotkin, J., Blitz, D., … Gershon, R. (2014). Factor structure, convergent validity, and discriminant validity of the NIH Toolbox Cognitive Health Battery (NIHTB-CHB) in adults. Journal of the International Neuropsychological Society, 20(06), 579587. doi: 10.1017/S1355617714000307
Nestor, S.M., Rupsingh, R., Borrie, M., Smith, M., Accomazzi, V., Wells, J.L., … the Alzheimer’s Disease Neuroimaging Initiative. (2008). Ventricular enlargement as a possible measure of Alzheimer’s disease progression validated using the Alzheimer’s disease neuroimaging initiative database. Brain, 131(9), 24432454. doi: 10.1093/brain/awn146
O’Shea, A., Cohen, R., Porges, E.C., Nissim, N.R., & Woods, A.J. (2016). Cognitive aging and the hippocampus in older adults. Frontiers in Aging Neuroscience, 8. doi: 10.3389/fnagi.2016.00298
Passing, H. & Bablok, N. (1983). A new biometrical procedure for testing the equality of measurements from two different analytical methods. Application of linear regression procedures for method comparison studies in clinical chemistry, Part I. Journal of Clinical Chemistry and Clinical Biochemistry. Zeitschrift Fur Klinische Chemie Und Klinische Biochemie, 21(11), 709720.
Raz, N., Lindenberger, U., Rodrigue, K.M., Kennedy, K.M., Head, D., Williamson, A., … Acker, J.D. (2005). Regional brain changes in aging healthy adults: general trends, individual differences and modifiers. Cerebral Cortex (New York, N.Y.: 1991), 15(11), 16761689. doi: 10.1093/cercor/bhi044
Reitan, R.M. & Wolfson, D. (1985). The Halstead-Reitan neuropsychological test battery: theory and clinical interpretation. Tucson, Ariz.: Neuropsychology Press.
Rey, A. (1964). L’examen clinique en psychologie. In Le Psychologue (2e éd.). Paris: Presses universitaires de France.
Schmidt, M. (1996). Rey Auditory and Verbal Learning Test: A handbook. Los Angeles: Western Psychological Services.
Shirk, S.D., Mitchell, M.B., Shaughnessy, L.W., Sherman, J.C., Locascio, J.J., Weintraub, S., & Atri, A. (2011). A web-based normative calculator for the uniform data set (UDS) neuropsychological test battery. Alzheimer’s Research and Therapy, 3(6), 32. doi: 10.1186/alzrt94
Sinha, P., Wong, A.W.K., Kallogjeri, D., & Piccirillo, J.F. (2018). Baseline cognition assessment among patients with oropharyngeal cancer using PROMIS and NIH Toolbox. JAMA Otolaryngology–Head & Neck Surgery, 144(11), 978. doi: 10.1001/jamaoto.2018.0283
Steiger, J.H. (1980). Tests for comparing elements of a correlation matrix. Psychological Bulletin, 87(2), 245251. doi: 10.1037/0033-2909.87.2.245
Tulsky, D.S., Carlozzi, N.E., Holdnack, J., Heaton, R. K., Wong, A., Goldsmith, A., & Heinemann, A.W. (2017). Using the NIH Toolbox cognition battery (NIHTB-CB) in individuals with traumatic brain injury. Rehabilitation Psychology, 62(4), 413424. doi: 10.1037/rep0000174
Wechsler, D. (1987). Manual for the Wechsler memory scale-revised. San Antonio, TX: Psychological Corporation.
Wechsler, D. (1997). Wechsler memory scale: WMS-III (3rd ed.). San Antonio: Psychological Corp., Harcourt Brace.
Weintraub, S., Dikmen, S.S., Heaton, R.K., Tulsky, D.S., Zelazo, P.D., Bauer, P.J., … Gershon, R.C. (2013). Cognition assessment using the NIH Toolbox. Neurology, 80(Issue 11, Suppl. 3), S54S64. doi: 10.1212/WNL.0b013e3182872ded
Weintraub, S., Dikmen, S.S., Heaton, R.K., Tulsky, D.S., Zelazo, P.D., Slotkin, J., … Gershon, R. (2014). The cognition battery of the NIH toolbox for assessment of neurological and behavioral function: validation in an adult sample. Journal of the International Neuropsychological Society: JINS, 20(6), 567578. doi: 10.1017/S1355617714000320
Weintraub, S., Salmon, D., Mercaldo, N., Ferris, S., Graff-Radford, N.R., Chui, H., … Morris, J.C. (2009). The Alzheimer’s Disease Centers’ Uniform Data Set (UDS): The neuropsychologic test battery. Alzheimer Disease & Associated Disorders, 23(2), 91. doi: 10.1097/WAD.0b013e318191c7dd


Type Description Title
Supplementary materials

Scott et al. supplementary material
Scott et al. supplementary material 1

 Word (8.0 MB)
8.0 MB

Psychometric Properties of the NIH Toolbox Cognition Battery in Healthy Older Adults: Reliability, Validity, and Agreement with Standard Neuropsychological Tests

  • Emmi P. Scott (a1), Anne Sorrell (a2) and Andreana Benitez (a1)


Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed