Hostname: page-component-546b4f848f-hhr79 Total loading time: 0 Render date: 2023-06-01T02:42:01.342Z Has data issue: false Feature Flags: { "useRatesEcommerce": true } hasContentIssue false

Three-month Practice Effect of the National Institutes of Health Toolbox Cognition Battery in Young Healthy Adults

Published online by Cambridge University Press:  08 July 2022

Leah E. Kuzmuk
Faculty of Medicine, University of British Columbia, Vancouver, Canada
Alexander D. Rebchuk
Faculty of Medicine, University of British Columbia, Vancouver, Canada Division of Neurosurgery, University of British Columbia, Vancouver, Canada
Halina M. Deptuck
Faculty of Education, University of British Columbia, Vancouver, Canada
Molly Cairncross
Department of Psychology, University of British Columbia, Vancouver, Canada Rehabilitation Research Program, Vancouver Coastal Health Research Institute, Vancouver, Canada
Noah D. Silverberg
Department of Psychology, University of British Columbia, Vancouver, Canada Rehabilitation Research Program, Vancouver Coastal Health Research Institute, Vancouver, Canada
Thalia S. Field*
Faculty of Medicine, University of British Columbia, Vancouver, Canada Division of Neurology, University of British Columbia, Vancouver, Canada Djavad Mowafaghian Centre for Brain Health, University of British Columbia, Vancouver, Canada Vancouver Stroke Program, University of British Columbia, Vancouver, Canada
Corresponding author: Dr. Thalia Field, S169-2211 Wesbrook Mall, Vancouver BC V6T 2B5, Canada. Email:
Rights & Permissions[Opens in a new window]


The National Institutes of Health Toolbox-Cognition Battery (NIHTB-CB) is a tablet-based cognitive assessment intended for individuals with neurological diseases of all ages. NIHTB-CB practice effects (PEs), however, need clarification if this measure is used to track longitudinal change. We explored the test–retest PEs on NIHTB-CB performance at 3 months in young healthy adults (n = 22). We examined corrected T-scores normalized for demographic factors and calculated PEs using Cohen’s d. There were significant PEs for all NIHTB-CB composite scores and on 4/7 subtests. This work suggests the need to further assess NIHTB-CB PEs as this may affect the interpretation of study results incorporating this battery.

Résumé :


Effets liés à l’utilisation de la National Institutes of Health Toolbox Cognition Battery chez de jeunes adultes en bonne santé pendant une période de trois mois.

La National Institutes of Health Toolbox Cognition Battery (ou NIHTB-CB) constitue une évaluation cognitive sur tablette destinée aux personnes de tous âges qui sont atteintes de maladies neurologiques. Les effets liés à l’utilisation (practice effects) de cet outil d’évaluation doivent cependant être clarifiés lorsqu’on s’en sert pour assurer le suivi de changements longitudinaux. Nous avons à cet égard exploré les effets des tests-retests sur les performances du NIHTB-CB au bout de trois mois chez de jeunes adultes en bonne santé (n = 22). Nous avons examiné les scores T corrigés et normalisés pour des facteurs démographiques en plus de calculer les effets de l’utilisation de cet outil en utilisant le d de Cohen. Ont alors émergé des effets significatifs pour tous les scores composites du NIHTB-CB ainsi que dans le cas de 4 sous-tests sur 7. Ce travail suggère donc la nécessité d’évaluer davantage les effets liés à l’utilisation de la NIHTB-CB car cela peut affecter l’interprétation des résultats des études intégrant cet outil.

Brief Communication
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
© The Author(s), 2022. Published by Cambridge University Press on behalf of Canadian Neurological Sciences Federation

The National Institutes of Health Toolbox Cognition Battery (NIHTB-CB) is a brief tablet-based assessment of cognitive functioning designed for use across the lifespan. Reference Weintraub, Dikmen and Heaton1 The NIHTB-CB provides common data elements for clinical research in evaluating cognition and has been validated for use in many neurological disorders. Reference Carlozzi, Goodnight and Casaletto2,Reference Carlozzi, Tulsky and Wolf3 The NIHTB-CB has seven subtests and produces scores normalized for age, gender, education and race–ethnicity for Fluid cognition, Crystallized cognition, and Total cognition.

Practice effects (PEs), or improvement in test performance due to repeated exposure to testing materials, have been investigated for the NIHTB-CB in a small body of literature over short- (1–5 weeks) Reference Heaton, Akshoomoff and Tulsky4,Reference Parsey, Bagger, Trittschuh and Hanson5,Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6 and longer-term (15 months) Reference Scott, Sorrell and Benitez7 intervals in middle-aged and older adults. These studies did not find consistent evidence of PEs in these particular cohorts, over these time intervals. It is unclear if these findings can be extrapolated to younger adults, who may have a better cognitive reserve in the context of a neurological insult. Reference Roldán-Tapia, García, Cánovas and León8

Recent work from our group has demonstrated that the NIHTB-CB detects cognitive deficits in high-functioning young stroke survivors with normal scores on the Montreal Cognitive Assessment. Reference Rebchuk, Kuzmuk, Deptuck, Silverberg and Field9 We conducted an exploratory pilot in the healthy control group of the aforementioned study to assess possible PEs as a consideration in future research with young adults. We investigated a 3-month test–retest interval as 90-day outcomes are commonly assessed in acute stroke trials. Further, many stroke survivors will experience persisting cognitive impairment at 3 months despite an excellent functional recovery. Reference Jokinen, Melkas and Ylikoski10,Reference Blackburn, Bafadhel, Randall and Harkness11 We expected higher retest performance for Fluid Cognition scores and Total Cognition. We expected stable retest performance for Crystallized Cognition scores.

We recruited healthy adults aged 18–55 years old. Further eligibility criteria included fluency in English, normal use of one’s dominant hand, no history of neurological or psychiatric disease, no diagnosed learning disability, and no prior exposure to the NIHTB-CB. Participants were recruited beginning in November 2017 by advertisement at a local academic hospital. The intended sample size was a convenience sample of 50 participants, who were serving as controls for a study in young stroke survivors. Reference Rebchuk, Kuzmuk, Deptuck, Silverberg and Field9 This provided over 80% power a priori with a two-tailed alpha of 0.05 to detect an estimated effect size of 0.25 based on assumptions from previous test–retest work. Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6,Reference Faul, Erdfelder, Lang and Buchner12 Recruitment was still open in March 2020, but due to restrictions related to the COVID-19 pandemic, we completed the study early as an exploratory pilot with 22 participants.

The NIHTB-CB has seven subtests measuring five major cognitive domains: language, executive function, episodic memory, processing speed, and working memory. Reference Weintraub, Dikmen and Heaton1 In addition to reporting performance on individual tests, subtests are aggregated as measures of Crystallized Cognition (Picture Vocabulary and Oral Reading Recognition) and Fluid Cognition (Flanker Inhibitory Control and Attention, List Sorting Working Memory, Dimensional Change Card Sort, Pattern Comparison Processing Speed, and Picture Sequence Memory). Performance is adjusted for demographic factors, including age, education, gender, and race–ethnicity. Subtest and composite scores are reported as fully corrected T-scores (mean: 50, SD: 10). Reference Casaletto, Umlauf and Beaumont13

A trained research assistant administered the NIHTB-CB on a 9.7” iPad Pro (Apple, CA) in a quiet, distraction-free room. The assessment was administered in English. Administration time was approximately 30 minutes. Assessments were completed under the same test conditions with a 3-month (±2 weeks) test–retest interval.

The study protocol was approved by the local institutional review boards and the research ethics committee at the University of British Columbia. Written informed consent was obtained by all participants.

Statistical analysis was completed using IBM SPSS Statistics version 26.0 (Armonk, NY). Demographic data are reported as descriptive statistics. Test–retest comparisons were made using paired t-tests and Wilcoxon rank tests (two-tailed, p ≤ 0.05) for parametric and non-parametric data, respectively. Test–retest bivariate correlations were made using Pearson’s correlation coefficient (r). PEs (effect sizes) were calculated using Cohen’s d for repeated measures (within-subject version) with 95% confidence intervals and cutoffs of 0.2, 0.5, and 0.8 for small, moderate, and large effects, respectively. Reference Lakens14 All participant data are included in the analysis.

Twenty-two participants completed both NIHTB-CB assessments over a median interval of 94 (IQR 82–106) days. Median age was 38 (IQR 34–45) years and 55% of participants were women. Self-identified race-ethnicity for most participants was either white (73%) or Asian (18%). Mean level of education was 16.0 (SD 2.5) years (Table 1).

Table 1: Participant demographics (n = 22)

Mean NIHTB-CB scores for Fluid, Crystallized, and Total Cognition were significantly higher at the second compared to the first administration (Table 2, see supplementary materials for participant spaghetti plots). Significant PEs were observed for all three composite cognition scores. Total Cognition had the largest PE (Cohen’s d = 0.8 [95% CI: 0.41–1.18], p < 0.001), with moderate (0.7 [0.28–1.07], p = 0.001) and small (0.4 [0.10–0.74], p = 0.012) PEs observed for Fluid and Crystallized Cognition, respectively. Moderate PEs were seen for both the Flanker Inhibitory Control (0.5 [0.14–0.86], p = 0.007) and Picture Sequence Memory (0.7 [0.09–1.29, p = 0.023) subtests, and small PEs for Picture Vocabulary (0.3 [0.04–0.52], p = 0.032) and Pattern Comparison Processing Speed (0.3 [0.11–0.58], p = 0.006) (Table 2).

Table 2: NIHTB-CB Performance (fully-corrected T-scores) at test vs. retest over 3-month interval (n = 22)

* p-value for t-test and Wilcoxon rank test.

^ Nonparametric data which used Wilcoxon rank test for test–retest comparisons, all other comparisons were made using a paired t-test.

In this small cohort of young, educated, healthy adults, we found a significant 3-month PEs across all composite cognition scores of the NIHTB-CB. Moderate-to-small PEs were also seen across several individual subtests, with most marked changes in Picture Sequence Memory and Flanker Inhibitory Control.

Our findings differ from previous work examining PEs in the NIHTB-CB as both Fluid and Crystallized composite scores improved on re-testing. Previous work investigating a short test–retest interval of 3 weeks found a small to moderate effect size for Fluid Cognition (d = 0.42) and Total Cognition (d = 0.29), but no significant PE for Crystalized Cognition. Reference Heaton, Akshoomoff and Tulsky4 Our findings are unexpected. While fluid cognition represents dynamic cognitive processes, including working memory and executive function, which are more susceptible to aging or brain injury, crystallized cognition reflects cognitive processes relying on language and comprehension and tends to be stable across the lifespan with greater resilience to brain changes. Reference Carlozzi, Tulsky and Wolf3,Reference Heaton, Akshoomoff and Tulsky4 Thus, it is possible that, despite their emphasis on testing crystallized cognition, the design of the Picture Vocabulary subtest may still be sensitive to learning effects, at least amongst our cohort and over this time interval.

In contrast to 3-week repeated administration of the NIHTB-CB, an extended test–retest period of 15 months in older adults showed no significant PEs on any of the composite cognition scores, and a small effect size for the Dimensional Change Card Sort task. Reference Scott, Sorrell and Benitez7 These differences are not unexpected as PEs are typically more prominent with a shorter test-retest interval, and with higher frequency testing. Reference Bartels, Wegrzyn, Wiedl, Ackermann and Ehrenreich15 Our results are in line with PEs reported in older adults over a 3–5-week retest interval where significant PEs were seen for both Fluid and Total Cognition scores, as well as the Picture Sequence Memory task. Reference Parsey, Bagger, Trittschuh and Hanson5 However, it is challenging to compare PE trends between our younger adult sample and older adults as fluid cognition is known to decrease across adulthood in parallel with age-related neurobiological changes. Reference Tucker-Drob, de la Fuente, Köhncke, Brandmaier, Nyberg and Lindenberger16 Improved characterization of PEs helps to minimize overestimation of cognitive recovery, or, conversely, underestimation of cognitive deterioration, over time. Our findings suggest that three-month PEs, even on Crystallized Cognition performance, may be a potential consideration in interpreting longitudinal changes in NIHTB-CB scores.

There are limitations to our study. We had a smaller-than-anticipated sample size due to early termination with the COVID-19 pandemic. Importantly, the demographics of our study group, who are primarily white with post-secondary education, are not representative of the general population. However, cultural and educational backgrounds are thought to impact crystallized cognitive ability more than fluid cognition, which would not explain the moderate PEs for the Fluid Cognition score reported here. Reference Heaton, Akshoomoff and Tulsky4 Additionally, early-life education, as well as education in mid-to-late adulthood has been shown to improve crystallized cognition, but does not influence working memory, executive function, or other fluid cognitive abilities. Reference Thow, Summers, Saunders, Summers, Ritchie and Vickers17 The homogenous nature of our sample is further demonstrated by the fact that some of the standard deviations for the NIHTB-CB T-scores were small, for Total Cognition, and with a leptokurtic distribution. This may contribute to the moderate-to-large observed PEs on composite scores, despite nonsignificant effect sizes on many individual subtests. We found similar distributions in a previous study examining 3-week PEs in in-person versus virtual test conditions in healthy controls who had similarly high levels of education. Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6 We also acknowledge that, although the 3-month interval was chosen in considering usual timepoints for trials in acute stroke, different follow-up time intervals will be preferred for studies of different neurological and psychiatric conditions. Finally, we did not account for other factors such as sleep and stress, which may confound performance.

Despite the limitations of this pilot study, however, given that the NIHTB-CB is increasingly used as an outcome measure in studies of neurological and psychiatric disease, we feel it is important to draw attention to potential considerations of PEs if the NIHTB-CB is to be used as a repeated measure. Future confirmatory work with larger and more diverse cohorts is warranted.

Our findings suggest that the NIHTB-CB may have lower test–retest reliability in short-term repeated administrations in young, educated, healthy adults. Although preliminary, this work suggests that PEs may need to be considered in studies repeating the NIHTB-CB at 3-month intervals, and that even Fluid Cognition subtests may be subject to PEs in some circumstances. This work may be informative for clinical trials or observational studies using the NIHTB-CB to assess longitudinal cognitive outcomes.

Supplementary Material

For supplementary material accompanying this paper visit


Zoe O’Neill, BSc., medical student at McGill University: assistance in data collection.

Michelle Yuan, BSc., medical student at University of Alberta: assistance in data collection.

Conflict of Interest


Statement of Authorship

LEK was responsible for data collection, statistical analysis, and wrote the original manuscript. ADR assisted with data collection and critically revised the manuscript for intellectual content. HMD assisted with data collection. NDS and MC provided expertise on the NIHTB-CB and critically revised the manuscript for intellectual content. TSF was responsible for the study design and inception and critically revised the manuscript for intellectual content.


Weintraub, S, Dikmen, SS, Heaton, RK, et al. Cognition assessment using the NIH Toolbox. Neurology. 2013;80:S54S64. DOI 10.1212/WNL.0b013e3182872ded.CrossRefGoogle ScholarPubMed
Carlozzi, NE, Goodnight, S, Casaletto, KB, et al. Validation of the NIH Toolbox in individuals with neurologic disorders. Arch Clin Neuropsychol. 2017;32:55573. DOI 10.1093/arclin/acx020.CrossRefGoogle ScholarPubMed
Carlozzi, NE, Tulsky, DS, Wolf, TJ, et al. Construct validity of the NIH Toolbox Cognition Battery in individuals with stroke. Rehabil Psychol. 2017;62:44354. DOI 10.1037/rep0000195.CrossRefGoogle ScholarPubMed
Heaton, RK, Akshoomoff, N, Tulsky, D, et al. Reliability and validity of composite scores from the NIH Toolbox Cognition Battery in adults. J Int Neuropsychol Soc. 2014;20:58898. DOI 10.1017/S1355617714000241.CrossRefGoogle ScholarPubMed
Parsey, CM, Bagger, JE, Trittschuh, EH, Hanson, AJ. Utility of the iPad NIH toolbox cognition battery in a clinical trial of older adults. J Am Geriatr Soc. 2021;69:351928. DOI 10.1111/jgs.17382.CrossRefGoogle Scholar
Rebchuk, AD, Deptuck, HM, O’Neill, ZR, Fawcett, DS, Silverberg, ND, Field, TS. Validation of a novel telehealth administration protocol for the NIH toolbox-cognition battery. Telemed e-Health. 2019;25:23742. DOI 10.1089/tmj.2018.0023.CrossRefGoogle ScholarPubMed
Scott, EP, Sorrell, A, Benitez, A. Psychometric properties of the NIH toolbox cognition battery in healthy older adults: reliability, validity, and agreement with standard neuropsychological tests. J Int Neuropsychol Soc. 2019;25:85767. DOI 10.1017/S1355617719000614.CrossRefGoogle ScholarPubMed
Roldán-Tapia, L, García, J, Cánovas, R, León, I. Cognitive reserve, age, and their relation to attentional and executive functions. Appl Neuropsychol Adult. 2012;19:28. DOI 10.1080/09084282.2011.595458.CrossRefGoogle ScholarPubMed
Rebchuk, AD, Kuzmuk, LE, Deptuck, HM, Silverberg, ND, Field, TS. Evaluating high-functioning young stroke survivors with cognitive complaints. Can J Neurol Sci. 2021;49:36872. DOI 10.1017/cjn.2021.137.Google ScholarPubMed
Jokinen, H, Melkas, S, Ylikoski, R, et al. Post-stroke cognitive impairment is common even after successful clinical recovery. Eur. J Neurol. 2015;22:128894. DOI 10.1111/ene.12743.CrossRefGoogle ScholarPubMed
Blackburn, DJ, Bafadhel, L, Randall, M, Harkness, KA. Cognitive screening in the acute stroke setting. Age Ageing. 2013;42:1136. DOI 10.1093/ageing/afs116.CrossRefGoogle ScholarPubMed
Faul, F, Erdfelder, E, Lang, AG, Buchner, A. G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:17591. DOI 10.3758/BF03193146.CrossRefGoogle ScholarPubMed
Casaletto, KB, Umlauf, A, Beaumont, J, et al. Demographically corrected normative standards for the English version of the NIH toolbox cognition battery. J Int Neuropsychol Soc. 2015;21:37891. DOI 10.1017/S1355617715000351.CrossRefGoogle ScholarPubMed
Lakens, D. Calculating and reporting effect sizes to facilitate cumulative science: a practical primer for t-tests and ANOVAs. Front Psychol. 2013;4:112. DOI 10.3389/fpsyg.2013.00863.CrossRefGoogle ScholarPubMed
Bartels, C, Wegrzyn, M, Wiedl, A, Ackermann, V, Ehrenreich, H. Practice effects in healthy adults: a longitudinal study on frequent repetitive cognitive testing. BMC Neurosci. 2010;11:118118. DOI 10.1186/1471-2202-11-118.CrossRefGoogle ScholarPubMed
Tucker-Drob, EM, de la Fuente, J, Köhncke, Y, Brandmaier, AM, Nyberg, L, Lindenberger, U. A strong dependency between changes in fluid and crystallized abilities in human cognitive aging. Sci Adv. 2022;8:341. DOI 10.1126/sciadv.abj2422.CrossRefGoogle Scholar
Thow, ME, Summers, MJ, Saunders, NL, Summers, JJ, Ritchie, K, Vickers, JC. Further education improves cognitive reserve and triggers improvement in selective cognitive functions in older adults: the Tasmanian healthy brain project. Alzheimers Dement (Amst). 2018;10:2230. DOI 10.1016/j.dadm.2017.08.004.CrossRefGoogle ScholarPubMed
Figure 0

Table 1: Participant demographics (n = 22)

Figure 1

Table 2: NIHTB-CB Performance (fully-corrected T-scores) at test vs. retest over 3-month interval (n = 22)

Supplementary material: PDF

Kuzmuk et al. supplementary material

Kuzmuk et al. supplementary material

Download Kuzmuk et al. supplementary material(PDF)
PDF 182 KB