The National Institutes of Health Toolbox Cognition Battery (NIHTB-CB) is a brief tablet-based assessment of cognitive functioning designed for use across the lifespan. Reference Weintraub, Dikmen and Heaton1 The NIHTB-CB provides common data elements for clinical research in evaluating cognition and has been validated for use in many neurological disorders. Reference Carlozzi, Goodnight and Casaletto2,Reference Carlozzi, Tulsky and Wolf3 The NIHTB-CB has seven subtests and produces scores normalized for age, gender, education and race–ethnicity for Fluid cognition, Crystallized cognition, and Total cognition.
Practice effects (PEs), or improvement in test performance due to repeated exposure to testing materials, have been investigated for the NIHTB-CB in a small body of literature over short- (1–5 weeks) Reference Heaton, Akshoomoff and Tulsky4,Reference Parsey, Bagger, Trittschuh and Hanson5,Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6 and longer-term (15 months) Reference Scott, Sorrell and Benitez7 intervals in middle-aged and older adults. These studies did not find consistent evidence of PEs in these particular cohorts, over these time intervals. It is unclear if these findings can be extrapolated to younger adults, who may have a better cognitive reserve in the context of a neurological insult. Reference Roldán-Tapia, García, Cánovas and León8
Recent work from our group has demonstrated that the NIHTB-CB detects cognitive deficits in high-functioning young stroke survivors with normal scores on the Montreal Cognitive Assessment. Reference Rebchuk, Kuzmuk, Deptuck, Silverberg and Field9 We conducted an exploratory pilot in the healthy control group of the aforementioned study to assess possible PEs as a consideration in future research with young adults. We investigated a 3-month test–retest interval as 90-day outcomes are commonly assessed in acute stroke trials. Further, many stroke survivors will experience persisting cognitive impairment at 3 months despite an excellent functional recovery. Reference Jokinen, Melkas and Ylikoski10,Reference Blackburn, Bafadhel, Randall and Harkness11 We expected higher retest performance for Fluid Cognition scores and Total Cognition. We expected stable retest performance for Crystallized Cognition scores.
We recruited healthy adults aged 18–55 years old. Further eligibility criteria included fluency in English, normal use of one’s dominant hand, no history of neurological or psychiatric disease, no diagnosed learning disability, and no prior exposure to the NIHTB-CB. Participants were recruited beginning in November 2017 by advertisement at a local academic hospital. The intended sample size was a convenience sample of 50 participants, who were serving as controls for a study in young stroke survivors. Reference Rebchuk, Kuzmuk, Deptuck, Silverberg and Field9 This provided over 80% power a priori with a two-tailed alpha of 0.05 to detect an estimated effect size of 0.25 based on assumptions from previous test–retest work. Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6,Reference Faul, Erdfelder, Lang and Buchner12 Recruitment was still open in March 2020, but due to restrictions related to the COVID-19 pandemic, we completed the study early as an exploratory pilot with 22 participants.
The NIHTB-CB has seven subtests measuring five major cognitive domains: language, executive function, episodic memory, processing speed, and working memory. Reference Weintraub, Dikmen and Heaton1 In addition to reporting performance on individual tests, subtests are aggregated as measures of Crystallized Cognition (Picture Vocabulary and Oral Reading Recognition) and Fluid Cognition (Flanker Inhibitory Control and Attention, List Sorting Working Memory, Dimensional Change Card Sort, Pattern Comparison Processing Speed, and Picture Sequence Memory). Performance is adjusted for demographic factors, including age, education, gender, and race–ethnicity. Subtest and composite scores are reported as fully corrected T-scores (mean: 50, SD: 10). Reference Casaletto, Umlauf and Beaumont13
A trained research assistant administered the NIHTB-CB on a 9.7” iPad Pro (Apple, CA) in a quiet, distraction-free room. The assessment was administered in English. Administration time was approximately 30 minutes. Assessments were completed under the same test conditions with a 3-month (±2 weeks) test–retest interval.
The study protocol was approved by the local institutional review boards and the research ethics committee at the University of British Columbia. Written informed consent was obtained by all participants.
Statistical analysis was completed using IBM SPSS Statistics version 26.0 (Armonk, NY). Demographic data are reported as descriptive statistics. Test–retest comparisons were made using paired t-tests and Wilcoxon rank tests (two-tailed, p ≤ 0.05) for parametric and non-parametric data, respectively. Test–retest bivariate correlations were made using Pearson’s correlation coefficient (r). PEs (effect sizes) were calculated using Cohen’s d for repeated measures (within-subject version) with 95% confidence intervals and cutoffs of 0.2, 0.5, and 0.8 for small, moderate, and large effects, respectively. Reference Lakens14 All participant data are included in the analysis.
Twenty-two participants completed both NIHTB-CB assessments over a median interval of 94 (IQR 82–106) days. Median age was 38 (IQR 34–45) years and 55% of participants were women. Self-identified race-ethnicity for most participants was either white (73%) or Asian (18%). Mean level of education was 16.0 (SD 2.5) years (Table 1).
Mean NIHTB-CB scores for Fluid, Crystallized, and Total Cognition were significantly higher at the second compared to the first administration (Table 2, see supplementary materials for participant spaghetti plots). Significant PEs were observed for all three composite cognition scores. Total Cognition had the largest PE (Cohen’s d = 0.8 [95% CI: 0.41–1.18], p < 0.001), with moderate (0.7 [0.28–1.07], p = 0.001) and small (0.4 [0.10–0.74], p = 0.012) PEs observed for Fluid and Crystallized Cognition, respectively. Moderate PEs were seen for both the Flanker Inhibitory Control (0.5 [0.14–0.86], p = 0.007) and Picture Sequence Memory (0.7 [0.09–1.29, p = 0.023) subtests, and small PEs for Picture Vocabulary (0.3 [0.04–0.52], p = 0.032) and Pattern Comparison Processing Speed (0.3 [0.11–0.58], p = 0.006) (Table 2).
* p-value for t-test and Wilcoxon rank test.
^ Nonparametric data which used Wilcoxon rank test for test–retest comparisons, all other comparisons were made using a paired t-test.
In this small cohort of young, educated, healthy adults, we found a significant 3-month PEs across all composite cognition scores of the NIHTB-CB. Moderate-to-small PEs were also seen across several individual subtests, with most marked changes in Picture Sequence Memory and Flanker Inhibitory Control.
Our findings differ from previous work examining PEs in the NIHTB-CB as both Fluid and Crystallized composite scores improved on re-testing. Previous work investigating a short test–retest interval of 3 weeks found a small to moderate effect size for Fluid Cognition (d = 0.42) and Total Cognition (d = 0.29), but no significant PE for Crystalized Cognition. Reference Heaton, Akshoomoff and Tulsky4 Our findings are unexpected. While fluid cognition represents dynamic cognitive processes, including working memory and executive function, which are more susceptible to aging or brain injury, crystallized cognition reflects cognitive processes relying on language and comprehension and tends to be stable across the lifespan with greater resilience to brain changes. Reference Carlozzi, Tulsky and Wolf3,Reference Heaton, Akshoomoff and Tulsky4 Thus, it is possible that, despite their emphasis on testing crystallized cognition, the design of the Picture Vocabulary subtest may still be sensitive to learning effects, at least amongst our cohort and over this time interval.
In contrast to 3-week repeated administration of the NIHTB-CB, an extended test–retest period of 15 months in older adults showed no significant PEs on any of the composite cognition scores, and a small effect size for the Dimensional Change Card Sort task. Reference Scott, Sorrell and Benitez7 These differences are not unexpected as PEs are typically more prominent with a shorter test-retest interval, and with higher frequency testing. Reference Bartels, Wegrzyn, Wiedl, Ackermann and Ehrenreich15 Our results are in line with PEs reported in older adults over a 3–5-week retest interval where significant PEs were seen for both Fluid and Total Cognition scores, as well as the Picture Sequence Memory task. Reference Parsey, Bagger, Trittschuh and Hanson5 However, it is challenging to compare PE trends between our younger adult sample and older adults as fluid cognition is known to decrease across adulthood in parallel with age-related neurobiological changes. Reference Tucker-Drob, de la Fuente, Köhncke, Brandmaier, Nyberg and Lindenberger16 Improved characterization of PEs helps to minimize overestimation of cognitive recovery, or, conversely, underestimation of cognitive deterioration, over time. Our findings suggest that three-month PEs, even on Crystallized Cognition performance, may be a potential consideration in interpreting longitudinal changes in NIHTB-CB scores.
There are limitations to our study. We had a smaller-than-anticipated sample size due to early termination with the COVID-19 pandemic. Importantly, the demographics of our study group, who are primarily white with post-secondary education, are not representative of the general population. However, cultural and educational backgrounds are thought to impact crystallized cognitive ability more than fluid cognition, which would not explain the moderate PEs for the Fluid Cognition score reported here. Reference Heaton, Akshoomoff and Tulsky4 Additionally, early-life education, as well as education in mid-to-late adulthood has been shown to improve crystallized cognition, but does not influence working memory, executive function, or other fluid cognitive abilities. Reference Thow, Summers, Saunders, Summers, Ritchie and Vickers17 The homogenous nature of our sample is further demonstrated by the fact that some of the standard deviations for the NIHTB-CB T-scores were small, for Total Cognition, and with a leptokurtic distribution. This may contribute to the moderate-to-large observed PEs on composite scores, despite nonsignificant effect sizes on many individual subtests. We found similar distributions in a previous study examining 3-week PEs in in-person versus virtual test conditions in healthy controls who had similarly high levels of education. Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field6 We also acknowledge that, although the 3-month interval was chosen in considering usual timepoints for trials in acute stroke, different follow-up time intervals will be preferred for studies of different neurological and psychiatric conditions. Finally, we did not account for other factors such as sleep and stress, which may confound performance.
Despite the limitations of this pilot study, however, given that the NIHTB-CB is increasingly used as an outcome measure in studies of neurological and psychiatric disease, we feel it is important to draw attention to potential considerations of PEs if the NIHTB-CB is to be used as a repeated measure. Future confirmatory work with larger and more diverse cohorts is warranted.
Our findings suggest that the NIHTB-CB may have lower test–retest reliability in short-term repeated administrations in young, educated, healthy adults. Although preliminary, this work suggests that PEs may need to be considered in studies repeating the NIHTB-CB at 3-month intervals, and that even Fluid Cognition subtests may be subject to PEs in some circumstances. This work may be informative for clinical trials or observational studies using the NIHTB-CB to assess longitudinal cognitive outcomes.
For supplementary material accompanying this paper visit https://doi.org/10.1017/cjn.2022.273
Zoe O’Neill, BSc., medical student at McGill University: assistance in data collection.
Michelle Yuan, BSc., medical student at University of Alberta: assistance in data collection.
Conflict of Interest
Statement of Authorship
LEK was responsible for data collection, statistical analysis, and wrote the original manuscript. ADR assisted with data collection and critically revised the manuscript for intellectual content. HMD assisted with data collection. NDS and MC provided expertise on the NIHTB-CB and critically revised the manuscript for intellectual content. TSF was responsible for the study design and inception and critically revised the manuscript for intellectual content.