Accurately capturing cognitive changes along the Alzheimer’s disease (AD) continuum is essential for monitoring clinical progression and evaluating the efficacy of treatments to slow or halt cognitive decline (Evans et al., Reference Evans, McRae-McKee, Wong, Hadjichrysanthou, De Wolf and Anderson2018; Food and Drug Administration, 2018). Future AD research and clinical trials will likely target a more narrow spectrum of participants at earlier disease stages, in which AD pathology is present but clinical symptoms remain subtle or absent (Bateman et al., Reference Bateman, Xiong, Benzinger, Fagan, Goate and Fox2012; Dubois et al., Reference Dubois, Hampel, Feldman, Scheltens, Aisen, Andrieu and Jack2016). In these (pre)clinical stages, the expected degree of observable cognitive decline over a certain time interval is likely reduced as compared to later clinical stages in which cognitive impairment is (more) evident and decline more rapid (Sperling et al., Reference Sperling, Aisen, Beckett, Bennett, Craft, Fagan and Phelps2011). As most existing cognitive tests were primarily designed to track decline in the mild cognitive impairment (MCI) and dementia stages, they are probably suboptimal to track subtle cognitive decline observed in preclinical stages of AD (Evans et al., Reference Evans, McRae-McKee, Wong, Hadjichrysanthou, De Wolf and Anderson2018; Mortamais et al., Reference Mortamais, Ash, Harrison, Kaye, Kramer, Randolph and Ritchie2017; Rentz et al., Reference Rentz, Parra Rodriguez, Amariglio, Stern, Sperling and Ferris2013). In addition, several studies have shown that tests addressing specific cognitive domains such as episodic memory and semantic memory are more prone to capture decline in early clinical stages of AD, as compared to, for example, tests addressing global cognition (Lim et al., Reference Lim, Snyder, Pietrzak, Ukiqi, Villemagne, Ames and Maruff2016; Mortamais et al., Reference Mortamais, Ash, Harrison, Kaye, Kramer, Randolph and Ritchie2017; Papp, Rentz, Orlovsky, Sperling, & Mormino, Reference Papp, Rentz, Orlovsky, Sperling and Mormino2017). Consequently, optimal cognitive endpoints to assess cognitive changes over time are likely to differ across clinical stages on the AD continuum.
In the recently updated National Institute of Aging – Alzheimer’s Association (NIA-AA) research framework, Jack et al. proposed a novel clinical staging scheme to classify individuals along the continuum of AD, based on the presence of AD pathophysiology and severity of clinical symptoms (Jack et al., Reference Jack, Bennett, Blennow, Carrillo, Dunn and Haeberlein2018). In this 4+ clinical staging scheme, Stage 1 is described as a preclinical stage in which overt clinical symptoms are absent. Stage 2 is defined by subjective concerns regarding previous level of functioning and/or subtle abnormalities detectable on (longitudinal) sensitive cognitive testing, without the presence of any functional impairment. In Stage 3, abnormalities on cognitive tests are more apparent and mild functional impairment may be detectable. Finally, Stages 4, 5, and 6 are described as overt dementia, corresponding to mild, moderate, and severe dementia, respectively. Grouping individuals into these refined clinical stages may be beneficial in optimizing the selection and assessment of participants for a given treatment target and may be similarly beneficial in refining the cognitive assessment procedures optimal for a given stage (Jack et al., Reference Jack, Therneau, Weigand, Wiste, Knopman, Vemuri and Petersen2019). However, specific procedures to operationalize these stages have yet to be delineated. Moreover, it is yet unknown to what extent currently used cognitive outcomes vary in their ability to capture decline at the different clinical stages defined in the NIA-AA research framework.
A more refined understanding of the differential sensitivity of existing neuropsychological tests at the different NIA-AA-defined clinical stages is needed to provide guidance on the selection of outcome measures when this framework is applied to define the treatment population. Therefore, the current study aimed to identify sensitive measures to detect cognitive change for each of the four NIA-AA defined clinical stages (Jack et al., Reference Jack, Bennett, Blennow, Carrillo, Dunn and Haeberlein2018). To achieve this, we operationalized the NIA-AA clinical staging schema into measurable criteria and applied these criteria across AD biomarker-positive individuals obtained from four large cohorts. Subsequently, we investigated the sensitivity to decline of commonly used neuropsychological tests by clinical stage, with a focus on the pre-dementia stages 1 to 3. We hypothesized that with increasing clinical stage, more tests would show greater sensitivity to decline.
Data were obtained from the Harvard Aging Brain Study (HABS), the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and National Alzheimer’s Coordinating Center (NACC) databases, and the Amsterdam Dementia Cohort (ADC). For each cohort, specific recruitment criteria and data collection have been described in detail elsewhere (Aisen et al., Reference Aisen, Petersen, Donohue, Gamst, Raman and Thomas2010; Beekly et al., Reference Beekly, Ramos, Lee, Deitrich, Jacka, Wu and Raskind2007; Dagley et al., Reference Dagley, LaPoint, Huijbers, Hedden, McLaren, Chatwal and Schultz2017; van der Flier, Reference van der Flier2018). Briefly, the HABS cohort is a community-based sample of individuals aged ≥ 63 years, who are considered clinically normal at baseline by (1) a global Clinical Dementia Rating score of 0; and (2) performance above education-adjusted cut-offs on Logical Memory Story A Delayed Recall and the Mini-Mental State Examination (MMSE) (Dagley et al., Reference Dagley, LaPoint, Huijbers, Hedden, McLaren, Chatwal and Schultz2017). ADNI is a multicenter longitudinal cohort study with the primary goal of testing whether serial neuroimaging and other biological, clinical, and neuropsychological markers can be combined to measure clinical progression on the AD spectrum (http://adni.loni.usc.edu/wp-content/uploads/2008/07/adni2-procedures-manual.pdf). The NACC has developed and maintains a large database of standardized clinical and neuropathological research data, obtained from NIA-funded Alzheimer’s Disease Centers across the United States (Beekly et al., Reference Beekly, Ramos, Lee, Deitrich, Jacka, Wu and Raskind2007; Morris et al., Reference Morris, Weintraub, Chui, Cummings, DeCarli, Ferris and Ramos2006). The NACC database contains mostly memory clinic referred participants with some additional community-based recruitment. The ADC is a Dutch memory clinic-based cohort of individuals visiting the Alzheimer Center Amsterdam (van der Flier, Reference van der Flier2018). All participants in the ADC had undergone a standard diagnostic work-up, including medical history, neurological examination, laboratory screening tests, and neuropsychological evaluation. All studies were approved by an ethical review board, and all participants provided written informed consent to use their clinical data for research purposes. All data included in this manuscript were obtained in compliance with the Helsinki Declaration. All data were collected between June 2002 and April 2018.
Selection criteria for the current study included (1) amyloid positivity as determined by at least one abnormal marker of amyloid accumulation in positron emission tomography (PET) imaging or amyloid-β levels in cerebrospinal fluid (CSF) using previously published cohort-specific summary measures and cut-offs [19–21] (detailed amyloid assessment methods for each cohort are specified below); (2) at least one follow-up visit with neuropsychological testing available; and (3) MMSE > 10 or Montreal Cognitive Assessment > 2 at baseline (Folstein, Folstein, & McHugh, Reference Folstein, Folstein and McHugh1975; Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin and Chertkow2005). Neuropsychological baseline performance was anchored to time of first amyloid assessment.
AD Biomarker Classification
The HABS and ADNI cohorts used PET imaging data only, while both PET and CSF measures were used in NACC and ADC. Amyloid binding in HABS was measured using Pittsburgh compound B PET scanning, and in ADNI using Florbetapir AV-45 PET scanning (summary data were obtained from the ADNI Laboratory of Neuroimaging database: http://www.loni.ucla.edu/ADNI/). In HABS, amyloid positivity was based on GMM of the distribution volume ratio of mean uptake in frontal, lateral parietal and temporal, and retrosplenial regions (cut-off value ≥ 1.20 (Mormino et al., Reference Mormino, Betensky, Hedden, Schultz, Ward, Huijbers and Sperling2014)). In ADNI, amyloid positivity was based on standard uptake value ratios of mean uptake in four cortical regions (frontal, cingulate, parietal, and temporal cortices) normalized to the whole cerebellum uptake (cut-off value ≥ 1.11 (Clark et al., Reference Clark, Schneider, Bedell, Beach, Bilker and Mintun2011)). In the NACC cohort, amyloid positivity in PET or CSF was based on each center’s local standard for positivity (Besser et al., Reference Besser, Kukull, Knopman, Chui, Galasko, Weintraub and Morris2018). In the ADC, amyloid positivity on Florbetaben, Flutametamol, PIB-PET, or Florbetapir AV-45 PET scans was based on whole-brain visual assessment performed by an experienced nuclear medicine physician (B.N.M.B.) who was blinded to clinical information. Amyloid positivity in CSF in the ADC was based on drift-corrected amyloid-β 1–42 values, with a cut-off of 813 pg/ml (Tijms et al., Reference Tijms, Willemse, Zwan, Mulder, Visser, van Berckel and Teunissen2018).
To operationalize the NIA-AA clinical staging scheme, we used baseline measures of subjective cognitive decline (SCD), cognitive impairment, and functional impairment available across cohorts (Table 1).
* Based on the highest and lowest quintile in our study sample.
** Only if caused by endorsement of memory box of the CDR-SB with 0.5.
Subjective cognitive decline
SCD (yes/no) was quantified as either a memory clinic visit or by an SCD screening score for the community-based cohorts. That is, by definition, all individuals from the ADC memory clinic cohort were classified as having SCD. For the NACC cohort, all individuals that were referred by an ADRC (n = 154, 75%) were classified as having SCD as well. For the remaining NACC individuals, as well as for all individuals from the ADNI and HABS cohorts, an SCD screening score was calculated. This SCD screening score was extracted from single items from available self-report questionnaires (i.e. the Everyday Cognition scale (ECog) (Farias et al., Reference Farias, Mungas, Reed, Cahn-Weiner, Jagust, Baynes and DeCarli2008), the Geriatric Depression Scale – 12 item version (GDS-12) (Yesavage et al., Reference Yesavage, Brink, Rose, Lum, Huang, Adey and Leirer1983), and Memory Questionnaire (MemQ), addressing (1) recent change in memory functioning (MemQ item 1); (2) consistent change over the last few months (GDS-12 item 6); and (3) concern associated with this change (ECog item 1). Answers were scored as yes = 1 and no = 0, leading to a total score range from 0 to 3 with higher scores reflecting higher level of SCD. A score of 2 or higher was labeled as having SCD.
Level of cognitive impairment was quantified using (1) the MMSE (or a MOCA transformed score if MMSE was unavailable) (Folstein et al., Reference Folstein, Folstein and McHugh1975; Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin and Chertkow2005; Trzepacz, Hochstetler, Wang, Walker, & Saykin, Reference Trzepacz, Hochstetler, Wang, Walker and Saykin2015); and (2) a memory retention score reflecting the proportion of items recalled from either story or word list on immediate and delayed recall (Schmidt, Reference Schmidt1996; Wechsler, Reference Wechsler1987).
Severity of functional impairment was determined using the CDR® Dementia Staging Instrument global score or the CDR sum of boxes (CDR-SB) (Hughes, Berg, Danziger, Coben, & Martin, Reference Hughes, Berg, Danziger, Coben and Martin1982; M. M. Williams, Storandt, Roe, & Morris, Reference Williams, Storandt, Roe and Morris2013).
For all measures, we created stage-specific cut-off scores based on previously published data or the highest or lowest quintiles in our sample (Table 1). To summarize, Stage 1 was quantified as (1) no visit to a memory clinic or a SCD screening score <2; (2) an MMSE score of ≥26 (Chapman et al., Reference Chapman, Bing-Canar, Alosco, Steinberg, Martin, Chaisson and Stern2016), a proportion of ≥52% items learned on a story or word list (highest quintile in our sample), and a delayed recall of >11 items on the logical memory tests (Aisen et al., Reference Aisen, Petersen, Donohue, Gamst, Raman and Thomas2010) and (3) a CDR-SB score ≤ .05 and global CDR of 0 (M. M. Williams et al., Reference Williams, Storandt, Roe and Morris2013). Stage 2 was quantified as (1) an MMSE score of ≥26 (Chapman et al., Reference Chapman, Bing-Canar, Alosco, Steinberg, Martin, Chaisson and Stern2016), a proportion of ≥52% items learned on a story or word list and a delayed recall; and (2) a CDR-SB score ≤1 and global CDR of <.5  only if caused by endorsement of memory box of the CDR-SB of 0.5. Stage 3 was quantified as (1) a MMSE score ≥ 24 (Chapman et al., Reference Chapman, Bing-Canar, Alosco, Steinberg, Martin, Chaisson and Stern2016) and a proportion items learned ≥20% (lowest quintile in our sample); (2) a CDR-SB score between 1.5 and 4 and global CDR of .5 (M. M. Williams et al., Reference Williams, Storandt, Roe and Morris2013). Stage 4 was quantified as (1) a MMSE score <26  and a proportion items learned <20% (lowest quantile in our sample); (2) a CDR-SB score ≥4.5 and global CDR of ≥1 (M. M. Williams et al., Reference Williams, Storandt, Roe and Morris2013).
We initially applied a strict approach in that at least one variable from all clinical features (SCD, cognition and function) was required for categorization, in order to create distinct categories in line with clinical trial screening procedures. Participants who remained unclassified due to incongruencies among clinical features were classified in the stage in which the majority of their classification measures fit best. For example, an individual with an MMSE of 23, a proportion items learned of .45, a CDR-SB of 2, and a global CDR score of 0.5 was classified as Stage 3.
We selected all neuropsychological tests available in at least two cohorts. This resulted in a total of 12 different tests providing 17 individual measures. Supplementary Table 1 provides an overview of which tests were available by cohort. Global cognition measures included the MMSE and MOCA (Folstein et al., Reference Folstein, Folstein and McHugh1975; Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin and Chertkow2005). Episodic memory measures included the Wechsler memory subscale Logical Memory (LM) immediate recall and delayed recall (Wechsler, Reference Wechsler1987), and the Rey Auditory Verbal Learning Test (RAVLT) (Schmidt, Reference Schmidt1996) with version 1 used in ADC, and version 1 and 2 alternated in ADNI. Additionally, we standardized immediate recall, delayed recall, and recognition measures obtained from list learning tests that were available across cohorts (Selective Reminding Test in HABS, the ADAS-Cog Word Lists in ADNI and RAVLT in ADC), by calculating z-scores by cohort with total baseline mean and standard deviation as reference values. Subsequently, these z-scores were combined into an overall immediate recall score, an overall delayed recall score, and overall recognition score. Semantic memory measures included the Category Fluency Test (CFT) Animals and Vegetables (Lezak, Reference Lezak2004) and the Boston Naming Test (BNT) (B. W. Williams, Mack, & Henderson, Reference Williams, Mack and Henderson1989). EF measures included the controlled oral word association test (COWAT) (Ruff, Light, Parker, & Levin, Reference Ruff, Light, Parker and Levin1996) and Trail Making Test (TMT) part B (Tombaugh, Reference Tombaugh2004). Attention and working memory measures included the Wechsler Adult Intelligence Scale (WAIS-IV) subscales Digit Span Forward and Backward (Wechsler, Reference Wechsler2008). Measures of processing speed included the TMT part A (Tombaugh, Reference Tombaugh2004) and WAIS-IV Symbol Substitution (Wechsler, Reference Wechsler2008).
Statistical analyses were performed using R version 3.5.3 (R Core Team, 2018). Statistical significance was set at p < .05. Demographic and clinical differences between stages were investigated using Chi-square tests and one-way analyses of variance followed by Tukey’s HSD test for post hoc comparisons.
To investigate the sensitivity to change over time of each neuropsychological test, a series of linear mixed models (LMM) were performed with a random intercept and slope for each subject. Separate models were run for each test score (dependent), with time (measured on a continuous level), age (centered at overall mean), sex, education (centered at overall mean), stage (as categorical variable), and the interaction between time and stage as independent variables. To examine whether tests were differentially sensitive to change over time across stages, we focused on the time and time × stage estimates. Subsequently, LMM were repeated for each neuropsychological test separately for each stage, to identify which tests were sensitive within 12 months of follow-up within each stage. For tests that were identified as sensitive to 12-months decline, mean to standard deviation ratios (MSDRs) of change over baseline to 12, 24, and 36 months follow-up were calculated as a measure of effect-size.
A total of 1103 participants were included (HABS n = 74, ADNI n = 506, NACC = 204, ADC n = 319). The majority of the US individuals were Caucasian (93% in ADNI, 92% in NACC, and 78% in HABS). Overall, years of follow-up ranged from 1 to 12.9, with a mean of 2.3 ± 1.6 years, and number of follow-up visits ranged from 1 to 8, with a mean of 2.3 ± 1.4 visits. After applying our defined staging criteria, n = 1005 (91%) participants were initially classified in one of the four stages, while n = 99 (9%) remained unclassified due to incongruencies among classification measures and were then classified in the closest stages. Ultimately, n = 120 were classified as Stage 1, n = 206 as Stage 2, n = 467 as Stage 3, and n = 309 as Stage 4+.
Table 2 presents the demographic and clinical characteristics for the different stages. Groups differed regarding age, sex, and education, with post hoc comparisons revealing that Stage 1 participants were older than all other participants and Stage 2 participants were older than Stage 4 participants. Post hoc comparisons also showed that Stage 1 participants had higher education levels compared with all other participants and Stage 2 had higher education than Stage 4 participants. By definition, we found worse MMSE and CDR-SB scores with increasing clinical stage. All HABS participants fell into Stages 1 and 2, all ADC participants were classified into Stages 2 to 4, and ADNI and NACC participants were distributed among all stages. Overall, the four stages were in correspondence with clinical diagnoses as established in the original cohorts, given that Stage 1 only included cognitively normal participants, and that most SCD, MCI, and dementia participants fell into Stages 2, 3, and 4, respectively (Table 2).
HABS, Harvard Aging Brain Study; ADC, Amsterdam Dementia Cohort; ADNI, Alzheimer’s Disease Neuroimaging Initiative; NACC, National Alzheimer’s Coordinating Center.
Sensitivity to Decline Over Time by Clinical Stage
Overall, LMM results revealed that neuropsychological tests were differentially sensitive to change over time at different clinical stages, as indicated by the Time and Time × Stage estimates in Table 3 (and Supplementary Table 2). For example, CFT Animals detected annual decline at all stages, showing a significant decline of .58 points (i.e. animals named in 60 seconds) in Stage 1 and Stage 2 (p < .001), and marginally significant steeper decline in Stage 3 (−.28 points, p = .06), and significantly steeper decline in Stage 4 (−1.27 points, p < .001). In contrast, the MMSE did not capture decline in Stages 1 and 2, but showed steeper decline in Stage 3 (−1.13 points, p < .001) and Stage 4 (−2.23 points, p < .001). Figure 1 illustrates the difference in sensitivity to decline over time for these two measures.
Second, LMM results indicated that Time × Stage estimates escalated by increasing clinical stage (Table 3), with most tests exhibiting a steeper decline in Stage 3 and 4 as compared to Stage 1.
Sensitivity to Decline by Follow-Up Duration
Annual decline on the individual neuropsychological tests summarized by cognitive domain is shown in Figure 2, with corresponding regression coefficients presented in Supplementary Table 3. Only three measures captured significant decline after 12 months in Stage 1 (CFT Animals, CFT Vegetables, LM Immediate Recall), and only four measures captured decline after 12 months in Stage 2 (CFT Animals, CFT Vegetables, Word List Delayed Recall and TMT-B). Six additional measures detected decline in Stage 3, (Digit Span Backward, Digit Span Forward, Word List Recognition, MMSE, MOCA, and TMT-A). In Stage 4, all measures detected decline after 12 months, except for Word Recognition. Table 4 presents the tests that detected decline after 12 months separately for Stages 1, 2, and 3, including their MSDRs at 12, 24, and 36 months follow-up, showing that, overall, MSDR’s tended to increase over time.
CFT, Category Fluency Test; LM, Logical Memory; TMT, Trail Making Test; MMSE, Mini-Mental State Examination; MOCA, Montréal Cognitive Assessment.
N.a. = Data not available for this time point.
We operationalized the NIA-AA clinical scheme into measurable criteria, which enabled classification of individuals with biomarker evidence of AD into the four different stages of clinical severity. Subsequently, we demonstrated that neuropsychological tests differed in their sensitivity to decline at these different clinical stages. We found that more tests were sensitive to decline by increasing clinical stage. Moreover, in general, longer time intervals were needed to capture greater magnitude of change. Sensitive tests for Stage 1 focused primarily on semantic memory, whereas sensitive tests in succeeding stages also covered episodic memory, executive functioning (Stage 2), working memory, processing speed, and global cognition (Stages 3 and 4).
We found that most tests were sensitive to change over time in Stages 3 and 4 and that the majority of those were capable of detecting decline after 1 year. This is unsurprising, as most of those measures were originally designed to detect frank cognitive decline which is assumed to become apparent from Stage 3 on (Jack et al., Reference Jack, Bennett, Blennow, Carrillo, Dunn and Haeberlein2018). However, we also identified a few measures that were sensitive to decline at Stages 1 and 2. Sensitive tests for Stage 1 included the CFT Animals and Vegetables (semantic memory) and LM immediate recall (episodic memory), whereas sensitive tests in Stage 2 included both CFT measures as well, plus Delayed Recall (episodic memory) and TMT-B (executive functioning). These results are largely in line with other studies showing that tests addressing the cognitive domains semantic memory, episodic memory, and executive functioning are among the first to decline in early AD (Mortamais et al., Reference Mortamais, Ash, Harrison, Kaye, Kramer, Randolph and Ritchie2017; Papp et al., Reference Papp, Rentz, Orlovsky, Sperling and Mormino2017; Ritchie et al., Reference Ritchie, Ropacki, Albala, Harrison, Kaye, Kramer and Ritchie2017).
Notably, not all measures addressing a particular cognitive domain were similarly sensitive to change across stages. For instance, a discrepancy between two semantic memory measures was observed, as the CFT detected decline in Stages 1 and 2, whereas the BNT did not. Possible explanations for this include differences regarding the difficulty level of both tests, as well as differences in measurement characteristics; for example, the BNT is untimed and has a fixed maximum score. As such, the BNT might be more susceptible to ceiling effects as compared to the CFT, thereby limiting its ability to detect decline in individuals with only subtle cognitive impairment. However, this is not to suggest that measurement properties alone influence the sensitivity of tests, as illustrated by the differences in sensitivity to change between the CFT and COWAT measures, which have similar measurement properties but rely on different cognitive strategies (Henry, Crawford, & Phillips, Reference Henry, Crawford and Phillips2004).
Multi-domain cognitive composite measures have been designed in attempt to improve the measurement of cognitive change in early stages of AD. Examples include the Preclinical Alzheimer’s Cognitive Composite (PACC) (Donohue et al., Reference Donohue, Sperling, Salmon, Rentz, Raman and Thomas2014) and the Alzheimer’s Prevention Initiative (API) composite for preclinical AD (Langbaum et al., Reference Langbaum, Hendrix, Ayutyanont, Chen, Fleisher, Shah and Reiman2014), and the Alzheimer’s disease Composite Score (Wang et al., Reference Wang, Logovinsky, Hendrix, Stanworth, Perdomo, Xu and Satlin2016) and the Cognitive-Functional Composite for prodromal stages of AD.(Jutten et al., Reference Jutten, Harrison, de Jong, Aleman, Ritchie, Scheltens and Sikkes2017). While composite scores resulting from these measures seem capable of tracking decline in preclinical and prodromal stages of AD (Donohue et al., Reference Donohue, Sun, Raman, Insel and Aisen2017; Lim et al., Reference Lim, Snyder, Pietrzak, Ukiqi, Villemagne, Ames and Maruff2016), few studies have focused on the individual components that contribute to these composites. In the current study, we identified sensitive measures that largely correspond with tests that have been included in the aforementioned composite measures (Donohue et al., Reference Donohue, Sperling, Salmon, Rentz, Raman and Thomas2014; Jutten et al., Reference Jutten, Harrison, de Jong, Aleman, Ritchie, Scheltens and Sikkes2017; Langbaum et al., Reference Langbaum, Hendrix, Ayutyanont, Chen, Fleisher, Shah and Reiman2014). However, our results also indicated that some of the measures included in composites for preclinical AD may actually be less useful for detecting decline in this stage. For instance, the PACC includes the MMSE, and the API includes both the MMSE and BNT, whereas in our data set, these measures were not found to be sensitive to decline before Stage 3. This suggests that existing composites could potentially be further optimized to track short-term disease progression in Stages 1 and 2. Our findings could be used to select most sensitive measures in order to optimize composites or create novel composites, which could then be validated in an independent sample.
There are some limitations that should be taken into account. First, our combination of community-based and memory clinic-based cohorts may have led to some heterogeneity in our sample, especially in Stages 1 and 2. With regard to the memory clinic-based cohorts, a potential limitation includes the fact that individuals who were labeled as having SCD may include individuals who actually did not have a memory concern themselves, but went to the memory clinic because a family member or other person expressed such a concern. Additionally, our results might have been biased by other differences between study cohorts, for example regarding amyloid assessment methods, follow-up timeframes, and cross-cultural differences. However, we explored cohort effects on the selected neuropsychological tests in an initial stage of our analyses, which we found did not affect our results. Another potential limitation includes the fact that we only looked at measures that were available across cohorts, thereby excluding potentially sensitive measures, such as the Free and Cued Selective Reminding Test (Grober, Veroff, & Lipton, Reference Grober, Veroff and Lipton2018) which was only available in the HABS cohort. However, we chose this approach to avoid that results on a specific tests would be driven by a single cohort. Last, it should be noted that the majority of our sample is Caucasian, which limits the generalizability of our findings across populations. On the other hand, our study represents a more global sample than other studies that rely on single cohorts, by combining four data sets originating from Europe and the US, and also regional diversity within the US. Another major strength of this study is our large sample of AD biomarker positive individuals covering the entire AD continuum. Furthermore, our classification of clinical severity as defined by the NIA-AA framework might have provided a more objective method to classify individuals instead of relying on clinical syndrome diagnoses that could have been biased by cohort (Jack et al., Reference Jack, Therneau, Weigand, Wiste, Knopman, Vemuri and Petersen2019).
To conclude, we demonstrated that commonly used neuropsychological tests differ in their ability to capture short-term cognitive decline and therefore disease progression depending on clinical stage (preclinical to symptomatic) within the AD continuum. This implies that stage-specific cognitive endpoints are needed to accurately assess change over time at different clinical stages of AD. The current study results can provide guidance on the selection of cognitive endpoints when the NIA-AA 2018 framework is applied to define the treatment population. Since only few existing tests were identified as sensitive in Stages 1 and 2, future directions include the development and validation of optimized composite measures, to capture the subtle cognitive decline observed in those preclinical stages. Furthermore, novel test paradigms that rely on digital assessment and scoring software are being developed, which could potentially aid in detecting and tracking of a greater magnitude of decline in preclinical stages of AD. This will ultimately lead to more accurate and fine-grained measurement and thereby increase the chance of successful evaluation of potentially preventive treatments for AD.
HABS: The Harvard Aging Brain Study is funded by the National Institute on Aging (P01AG036694; Principal Investigators Reisa Sperling, Keith Johnson) with additional support from several philanthropic organizations. KVP (1K23AG053422-01) is supported by a K23 award from NIA and an award from the Alzheimer’s Association.
ADC: The Alzheimer Center Amsterdam is supported by Alzheimer Nederland and Stichting VUMC funds. Research of the Alzheimer Center Amsterdam is part of the neurodegeneration research program of Amsterdam Neuroscience. The clinical database structure was developed with funding from Stichting Dioraphte. The SCIENCe project is supported by a research grant from Gieskes-Strijbis fonds. The present study is supported by a grant from Memorabel (grant no. 733050205), which is the research program of the Dutch Deltaplan for Dementia.
ADNI: Data collection and sharing for this project were funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol, Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann,La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.
NACC: The NACC database is funded by NIA/NIH grant U01 AG016976. NACC data are contributed by the NIA-funded ADCs: P30 AG019610 (PI Eric Reiman, MD), P30 AG013846 (PI Neil Kowall, MD), P30 AG062428-01 (PI James Leverenz, MD) P50 AG008702 (PI Scott Small, MD), P50 AG025688 (PI Allan Levey, MD, PhD), P50 AG047266 (PI Todd Golde, MD, PhD), P30 AG010133 (PI Andrew Saykin, PsyD), P50 AG005146 (PI Marilyn Albert, PhD), P30 AG062421-01 (PI Bradley Hyman, MD, PhD), P30 AG062422-01 (PI Ronald Petersen, MD, PhD), P50 AG005138 (PI Mary Sano, PhD), P30 AG008051 (PI Thomas Wisniewski, MD), P30 AG013854 (PI Robert Vassar, PhD), P30 AG008017 (PI Jeffrey Kaye, MD), P30 AG010161 (PI David Bennett, MD), P50 AG047366 (PI Victor Henderson, MD, MS), P30 AG010129 (PI Charles DeCarli, MD), P50 AG016573 (PI Frank LaFerla, PhD), P30 AG062429-01(PI James Brewer, MD, PhD), P50 AG023501 (PI Bruce Miller, MD), P30 AG035982 (PI Russell Swerdlow, MD), P30 AG028383 (PI Linda Van Eldik, PhD), P30 AG053760 (PI Henry Paulson, MD, PhD), P30 AG010124 (PI John Trojanowski, MD, PhD), P50 AG005133 (PI Oscar Lopez, MD), P50 AG005142 (PI Helena Chui, MD), P30 AG012300 (PI Roger Rosenberg, MD), P30 AG049638 (PI Suzanne Craft, PhD), P50 AG005136 (PI Thomas Grabowski, MD), P30 AG062715-01 (PI Sanjay Asthana, MD, FRCP), P50 AG005681 (PI John Morris, MD), and P50 AG047270 (PI Stephen Strittmatter, MD, PhD).
The authors are very thankful to all patients and participants in the studies included in the paper, as well as to everyone involved in the data collection and data sharing.
CONFLICTS OF INTEREST
RJJ, REA, RFB, MJP, DMR, KAJ, BNMB, PS and RS report no disclosures relevant to this manuscript. SAMS is supported by grants from JPND and Zon-MW and has provided consultancy services in the past 2 years for Nutricia and Takeda. All funds were paid to her institution. GAM has received research salary support from Eisai Inc., Eli Lilly and Company, Janssen Alzheimer Immunotherapy, Novartis, and Genentech. Additionally, GAM has served as a consultant for Grifols Shared Services North America, Inc., Eisai Inc., and Pfizer. WMF holds the Pasman chair. CET received grants from the European Commission, the Dutch Research Council (ZonMW), Association of Frontotemporal Dementia/Alzheimer’s Drug Discovery Foundation, The Weston Brain Institute, Alzheimer Nederland. CET has functioned in advisory boards of Roche, received non-financial support in the form of research consumables from ADxNeurosciences and Euroimmun, performed contract research or received grants from Probiodrug, Biogen, Esai, Toyama, Janssen prevention center, Boehringer, AxonNeurosciences, EIP farma, PeopleBio, Roche. KVP has served as a paid consultant for Biogen.
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617720000934