Cognition is an important outcome in research trials and clinical practice (McInnes et al., Reference McInnes, Friesen, MacKenzie, Westwood and Boe2017; Sheffield et al., Reference Sheffield, Karcher and Barch2018; Tang et al., Reference Tang, Amiesimaka, Harrison, Green, Price, Robinson and Stephan2018). To provide a common metric of cognition in the context of clinical research, the NIH Toolbox Cognition Battery (NIHTB-CB) was introduced. It is a brief, tablet-based cognitive assessment that has been validated for use in healthy populations and those with neurological and psychiatric disease (Carlozzi, Goodnight et al., Reference Carlozzi, Goodnight, Casaletto, Goldsmith, Heaton, Wong and Tulsky2017; Carlozzi, Tulsky et al., Reference Carlozzi, Tulsky, Wolf, Goodnight, Heaton, Casaletto and Heinemann2017; Weintraub et al., Reference Weintraub, Dikmen, Heaton, Tulsky, Zelazo, Slotkin and Gershon2014).
The NIHTB-CB is comprised of seven instruments: two assessing crystallized cognition (Picture Vocabulary and Oral Reading Recognition) and five assessing fluid cognition (Flanker Inhibitory Control and Attention, List Sorting Working Memory, Dimensional Change Card Sort, Pattern Comparison Processing Speed, and Picture Sequence Memory) (Weintraub et al., Reference Weintraub, Dikmen, Heaton, Tulsky, Zelazo, Bauer and Gershon2013). Of these, both instruments that assess crystallized cognition and two that assess fluid cognition (List Sorting Working Memory and Picture Sequence Memory) can be modified for administration without any physical contact between examinee and tablet. The other three fluid cognition instruments are scored based on accuracy and reaction time, and thus, require in-person inputs into the tablet.
In the context of the COVID-19 pandemic where strict physical distancing guidelines have been implemented, there is a strong need for remote cognitive assessments (Gostin & Wiley, Reference Gostin and Wiley2020). Our group has previously developed and validated a protocol for administering the NIHTB-CB using telemedicine to assess participants at remote sites (Rebchuk et al., Reference Rebchuk, Deptuck, O’Neill, Fawcett, Silverberg and Field2019). However, this protocol still requires in-person conditions for some instruments. Recent guidelines published by the NIHTB-CB developers describe an abbreviated protocol, incorporating only the four instruments that can be administered entirely remotely (HealthMeasures Help Desk, 2020a).
We sought to explore whether a prorated score based on this abbreviated battery could provide a valid substitute for the standard score from the full battery. We assessed the agreement between prorated fluid and total cognition scores from the abbreviated protocol versus standard scores from the full protocol. The equations we applied to estimate prorated scores were derived from published regression equations for NIHTB-CB standard scores (Casaletto et al., Reference Casaletto, Umlauf, Beaumont, Gershon, Slotkin and Heaton2015; HealthMeasures Help Desk, 2020b). As much ongoing research has been modified to facilitate physical distancing, this work helps to inform the future interpretation of data collected with the abbreviated NIHTB-CB protocol.
We extracted participant-level NIHTB-CB data gathered under standard conditions by trained examiners as part of six previous or ongoing studies in individuals with neurological disease [history of stroke or mild traumatic brain injury (mTBI)] or psychosis (inpatients with treatment-resistant psychosis) and healthy controls (no history of neurological disease, learning disability, or active psychosis). See Supplementary material for details of respective studies.
For all data sets, the NIHTB-CB was administered on an iPad (Apple, California, USA), and Form A of the cognition battery was used. Participant demographic data were captured with written questionnaires. All participants were older than 18 years and provided written informed consent. The experimental protocols for the respective studies were approved previously by the University of British Columbia’s Clinical Research Ethics Board, and conformed to the Declaration of Helsinki.
We chose to report standard scores corrected for age (mean = 100, standard deviations = 15) and not other demographic variables because education levels may not be equivalent across regions where our data were collected (Vancouver, Canada) and where the NIHTB-CB was normed (United States) (Chevalier et al., Reference Chevalier, Stewart, Nelson, McInerney and Brodie2016). As well, several of our participants identified with race(s) that the NIHTB-CB race/ethnicity options failed to capture.
Prorated fluid (Equation 1) and total (Equation 2) cognition scores were derived from appropriate regression equations provided with the NIHTB-CB (Casaletto et al., Reference Casaletto, Umlauf, Beaumont, Gershon, Slotkin and Heaton2015; HealthMeasures Help Desk, 2020b). The prorated fluid cognition score included instruments (List Sorting Working Memory and Picture Sequence Memory) that can be administered remotely without the examinee having direct access to the tablet.
(1) Prorated Fluid Cognition = 100 + 15 * [((Mean of List Sort & Pic Seq Mem Age-corrected Scores) – 100.15)/10.10]
(2) Prorated Total Cognition = 100 + 15 * [((Mean of Age-corrected Prorated Fluid Composite Score & Crystallized Composite Scores) – 100.02)/12.93]
Data were separated into healthy controls and disease-specific groups (stroke, mTBI, and psychosis). Demographic data between groups were compared using one-way analysis of variance for parametric data and chi-square test for categorical data.
Paired t tests were used to compare the standard and prorated fluid cognition score within each group; data met assumptions of normality (Meyers et al., Reference Meyers, Zellinger, Kockler, Wagner and Miller2013). Prediction error was determined for the difference between standard and prorated scores for each participant, as well as mean prediction error for each group. Intraclass correlation (ICC) values between standard and prorated fluid cognition group-level scores were generated using two-way mixed effects, absolute agreement, and multiple measurements model (Koo & Li, Reference Koo and Li2016). Data met assumptions of normality and equality of variance for ICC analyses. All analyses were repeated for the prorated total cognition scores. We operationalized a clinically meaningful discrepancy as 0.5 standard deviations (or 7.5 standard score points), and calculated the frequency of participants with prorated–standard discrepancies exceeding this magnitude (Silverberg & Millis, Reference Silverberg and Millis2009). A prediction error of zero reflects equal standard and prorated scores. Chi-square tests were used to compare observed frequencies of participants with clinically significant prediction errors (i.e., exceeding ±0.5 SD difference between total and prorated score) between groups. Data met the assumptions of chi-square testing.
Given the exploratory nature of the study, we did not correct for multiple comparisons. Significance was set a priori at 0.05. Statistical analyses were performed using IBM SPSS Statistics (Version 19.0; IBM Corp., Armonk, NY).
Data were available for 245 participants: 77 (31.4%) healthy controls, 66 (26.7%) individuals with mTBI, 63 (25.7%) with a history of stroke, and 39 (15.9%) with active psychosis. Almost half (48.6%) were female, mean age was 41.8 years (SD = 11.9), and mean duration of education was 14.9 years (SD = 2.6). Subgroup characteristics are shown in Table A1 in the Appendix.
Overall, fluid cognition prorated scores were higher than standard fluid cognition scores (mean difference +4.5, SD = 14.3; p < 0.001). These differences were significant in the stroke and mTBI groups, but not in the healthy or psychosis groups. This resulted in overall prorated scores for total cognition also being higher than standard total cognition scores (mean difference +2.7, SD = 8.3; p < 0.001). Again, these differences were only significant in the stroke and mTBI groups (see Table 1). Overall agreement between prorated and total scores as per the ICC was moderate-to-good for fluid cognition only, and good-to-excellent for total cognition.
Clinically significant fluid cognition prediction errors (greater than ±0.5 SD difference between total and prorated scores) were present in 62.9% of participants; 42.9% were overestimated and 20.0% were underestimated. For total cognition, 40.4% of participants had a prediction error; 28.6% were overestimated and 11.8% were underestimated (Figure 1). The psychosis group had the lowest percentage (59.0%) of fluid prediction errors, followed by healthy control (60.0%), mTBI (65.2%), and stroke (69.8%) groups. These numerical differences did not meet statistical significance (p = 0.425). For total cognition, healthy controls had the lowest percentage (33.8%) of prediction errors greater than ±0.5 SD, followed by those with psychosis (35.9%), mTBI (42.4%), and stroke (49.2%). Again, these differences were not statistically significant (p = 0.275).
The aim of this exploratory study was to assess the validity of a prorated score, based on a proposed abbreviated NIHTB-CB protocol, against the standard score for the usual protocol (HealthMeasures Help Desk, 2020a). Particularly during COVID-19-related physical distancing measures, the potential advantage of an abbreviated protocol is its ability for remote administration without personnel alongside the examinee. Beyond the COVID-19 pandemic, advantages of a fully remote protocol could include greater participation by those with mobility restrictions or in isolated communities, and fewer losses to follow-up (Berge et al., Reference Berge, Stapf, Al-Shahi Salman, Ford, Sandercock and van der Worp2016).
Overall, we found that prorated scoring for the abbreviated protocol overestimated fluid and total cognition standard scores. However, differences were noted between testing groups, with no group-level differences seen between prorated and standard scores in healthy individuals or in those with treatment-resistant psychosis.
It is uncertain as to whether these significant differences in group-level performance represent true differences related to domain-specific deficits from lesional injuries in the stroke or mTBI participant groups, random error, or insufficient statistical power to detect between-group differences in the healthy control group or, in particular, the psychosis group, which has the fewest participants (McInnes et al., Reference McInnes, Friesen, MacKenzie, Westwood and Boe2017; Nys et al., Reference Nys, van Zandvoort, de Kort, Jansen, de Haan and Kappelle2007; O’Brien et al., Reference O’Brien, Erkinjuntti, Reisberg, Roman, Sawada, Pantoni and DeKosky2003). The instruments included within our prorated scores include measures of working memory and episodic memory, and fail to capture processing speed, attention, and executive function (Mungas et al., Reference Mungas, Heaton, Tulsky, Zelazo, Slotkin, Blitz and Gershon2014). It may be that anatomic lesions or functional deficits (e.g., frontal lobe injury, motor deficits, and fatigue) in the stroke and mTBI cohorts result in worse performance in executive function and timed tasks, in particular, and hence lead to the overestimation of prorated scores with exclusion of instruments assessing these specific domains. The data were collected as part of six separate studies, and unmeasured confounders specific to study conditions may also play a role.
Although exclusion of processing speed, attention, and executive function tests from prorated scores failed to significantly affect the assessment of healthy controls and psychosis cohorts at the group level, we cannot confidently conclude that prorated scores are equivalent to standard scores in these groups. Amongst healthy controls, 60.0% of prorated scores were overestimated or underestimated by a clinically significant margin, and amongst psychosis patients, the rate was 59.0%. Given the significant variability in patient-level performance, these two methods should not be considered equivalent when considering individual-level data.
Our study has limitations. Our findings are limited to healthy individuals and those with stroke, mTBI, or treatment-resistant psychosis. Future studies should explore whether there may be groups in which an abbreviated protocol may be appropriate. Additionally, we only reported age-corrected scores, which do not control for sex, education, and ethnicity of participants; these factors may influence NIHTB-CB performance (Casaletto et al., Reference Casaletto, Umlauf, Beaumont, Gershon, Slotkin and Heaton2015).
At this point in time, we are simply comparing in-person testing with prorated versus standard scoring in advance of considering entirely remote adaptations of the NIHTB-CB protocol. We have not prospectively validated an abbreviated remote protocol as we are limited by current physical distancing recommendations related to the COVID-19 pandemic.
In conclusion, an abbreviated NIHTB-CB protocol is a pragmatic solution in the context of physical distancing requirements, but does not constitute a valid replacement for the standard protocol. Our preliminary findings suggest that prorated scores excluding the Flanker Inhibitory Control and Attention, Dimensional Change Card Sort, and Pattern Comparison Processing Speed instruments may tend to overestimate Fluid Composite scores. Thus, a fully remote version of the NIHTB-CB should include adapted versions of the timed instruments. We provide empirical evidence in support of newly updated guidelines by the NIHTB developers, which now state that prorated scores may not be comparable to standard scores (Salesforce, 2020). Still, remote administration of the current abbreviated protocol warrants further validation of the nontimed instruments. These individual instruments, administered remotely, may still benefit continuity of research measuring crystallized cognition and working and episodic memory.
The authors thank Leah Kuzmuk, Halina Deptuck, Zoe O’Neill, Hadley Pearce, Tasha Klotz, and Hiresh Gindwani for their assistance in data collection. This article was discussed with the Department of Medical Social Sciences at Northwestern University, which governs the scientific activity of NIH Toolbox, prior to submission. At this time, neither the authors’ group nor theirs would recommend an abbreviated NIHTB-CB protocol with prorated scoring as a replacement for the standard protocol.
NDS reports salary support from the Michael Smith Foundation for Health Research. TSF is supported by a Heart and Stroke Foundation of Canada National New Investigator Award, a Michael Smith Health Professional Investigator Award, and a Vancouver Coastal Health Research Institute Clinician-Scientist Award.
Conflict of interest
IJT has received consulting fees or sat on advisory boards for Lundbeck Canada, Sumitomo Dainippon, and Community Living British Columbia (CLBC). TSF receives study medication from Bayer Canada. The other authors report no relevant conflicts.
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617720001010