Alzheimer’s disease is an irreversible neurodegenerative disorder characterized by progressive impairments in cognition, daily living and social functioning activities [1, Reference Nichols, Szoeke, Vollset, Abbasi, Abd-Allah and Abdela2], and ultimately death [Reference Livingston, Sommerlad, Orgeta, Costafreda, Huntley and Ames3]. To date, five antidementia drugs have been approved by the FDA , and clinical trials of Alzheimer’s disease have shown small effects [Reference Birks and Harvey5]. For over 15 years, no new antidementia drug has been approved to treat Alzheimer’s disease [Reference Cummings, Lee, Ritter, Sabbagh and Zhong6], hence most antidementia drug trials failed [7–9]. Multiple explanations exist of the preponderance of failed clinical trials of Alzheimer’s disease. [7–9] Examples include the need for longer study durations and younger participant age (i.e., under 70) to capture disease progression [Reference Bernick, Cummings, Raman, Sun and Aisen10]. The role of sex differences is unclear since sex is rarely a part of efficacy analyses in clinical trials for Alzheimer’s disease [Reference Canevelli, Quarata, Remiddi, Lucchini, Lacorte and Vanacore11]. Nonetheless, female sex is more common in the population of persons with Alzheimer’s disease , and clinical trials of Alzheimer’s disease [Reference Canevelli, Quarata, Remiddi, Lucchini, Lacorte and Vanacore11]. Another explanation of failed clinical trials of Alzheimer’s disease may be that the course of cognition in Alzheimer’s disease is heterogeneous [12–16], which makes attaining research and clinical goals particularly challenging. Thus, understanding heterogeneity in Alzheimer’s disease may contribute to clinical trial design and treatment [Reference Carlsson17].
Statistical methods (e.g., latent class modeling) have been used to quantify the extent of symptom heterogeneity in various disorders [Reference Levine and Leucht18], including Alzheimer’s disease [13–16]. These methods identify different groups (termed classes or trajectories) with distinct progressive patterns and profiles [Reference Proust-Lima, Philipps and Liquet19]. Prior study estimates show that most persons with Alzheimer’s disease assume a slow progressive pattern of cognitive decline (72% [Reference Leoutsakos, Forrester, Corcoran, Norton, Rabins and Steinberg13], 76.5% [Reference Geifman, Kennedy, Buchan and Brinton20], 76% [Reference Haaksma, Rizzuto, Leoutsakos, Marengoni, Tan and Olde Rikkert21]), while few persons assume a rapid pattern of cognitive progression (4% [Reference Leoutsakos, Muthen, Breitner and Lyketsos22], 24% [Reference Haaksma, Rizzuto, Leoutsakos, Marengoni, Tan and Olde Rikkert21]). Studies have identified markers of trajectory membership to detect the sources of heterogeneity. For instance, younger age was associated with assuming a trajectory of slower disease progression [Reference Haaksma, Rizzuto, Leoutsakos, Marengoni, Tan and Olde Rikkert21]. However, to date, no study has examined the patterns and profiles of heterogeneity in antidementia medication for Alzheimer’s disease in clinical trials.
The current study aims to empirically quantify heterogeneous groups of cognitive functioning in Alzheimer’s disease and their profiles, based on individual participant data from five randomized clinical trials of donepezil.
We accessed pivotal individual-level participant data of randomized controlled double-blinded trials of donepezil conducted by Eisai Co., Ltd (see eTable 1). Data. Data access was granted following the submission of an a priori analytic plan. The data were analyzed on a secure Internet cloud-based platform (http://www.clinicalstudydatarequest.com). We included trials in which participants with Alzheimer’s disease were assessed with the Alzheimer’s Disease Assessment Scale–Cognitive Subscale (ADAS-Cog; Rosen et al., 1984). Individual-level data were ascertained from participants on five randomized clinical trials with similar follow-up intervals and ADAS-Cog scores [23–27]. Institutional review boards approved each trial, and all trial participants gave written informed consent.
Alzheimer’s disease assessment scale–cognitive subscale (ADAS-Cog)
The ADAS-Cog is a neuropsychological index of the severity of the cognitive symptoms of dementia, and is the gold-standard in clinical trials of Alzheimer’s disease. [Reference Kueper, Speechley and Montero-Odasso28,Reference Connor and Sabbagh29]. The ADAS-Cog consists of 11 tasks (word recall, word recognition, constructional praxis, orientation, naming objects and fingers, commands, ideational praxis, remembering test instruction, spoken language, word-finding, and comprehension) that include both participant-completed and observer-based assessments. ADAS-Cog total scores range from 0 to 70, with higher scores representing a more considerable cognitive impairment.
The purpose of the ADAS-Cog is to provide a comprehensive assessment of the extent of cognitive dysfunction in Alzheimer’s disease, whereas the purpose of the widely used Mini-Mental State Examination (MMSE) is to screen for cognitive impairment in the general population. Nonetheless, conversion between MMSE and ADAS-Cog total and change scores is possible (e.g., an MMSE total score of 3 converts to an ADAS-Cog total score of 64; 10–48, 20–24, and 30–6, respectively) [Reference Levine, Yoshida, Goldberg, Samara, Cipriani and Efthimiou30]. To interpret the results, we consider a four-point difference between groups on the ADAS-Cog as clinically relevant [Reference Rockwood, Fay, Gorman, Carver and Graham31]. Furthermore, meta-analysis has estimated the disease progression rate at 5.5 points per year for a patient population with a mean baseline ADAS-cog value of 25 [Reference Ito, Ahadieh, Corrigan, French, Fullerton and Tensfeldt32].
At step one of the analysis, we characterized the total study population. At step two, we computed latent class mixed modeling for the total study population as the primary analysis. Latent class mixed modeling consists of model identification, plotting, examining, and labeling the resultant classes. Latent class mixed modeling empirically identifies classes in the total population that may be understood as trajectories or groups. Latent class mixed modeling groups patients into classes to minimize within-group homogeneity and maximize between-group heterogeneity. Namely, the model aims for participants within the same class to resemble one another but differ from members of the other class(es).
Model identification consisted of fitting latent class mixed models for two to six classes to identify the number of classes that best fit the data. Two to six classes were fitted with the assessment week as a linear term and then fitted as a quadratic term. A linear term conceptually implies the course assumes a straight-line of cognitive impairment over time, whereas a quadratic term means that cognitive impairment over time assumes a curvilinear form. Fixed terms in the latent models were trial, sex, age, week, and treatment. Trial was set as a fixed rather than a random effect owing to software limitations. The model with the smallest Bayesian information criterion value was chosen as the most parsimonious (described in eTable 2).
Based on the most parsimonious latent class mixed model, each participant was assigned to a class based on posterior probability values. Posterior probabilities exceeding 0.7 are considered the cut-off for good classification [Reference Nagin and Odgers33]. The most parsimonious model was plotted to examine the pattern of cognitive impairment by week, and the characteristics of each class presented. At step three of the analysis, a series of binary logistic models were computed to examine the associations between the study covariates and class membership. Latent class mixed modeling was computed in R using the hlme function [Reference Proust-Lima, Philipps and Liquet19].
We replicated the primary analysis above (except without the treatment arm in the models), restricting to the donepezil arm and then placebo arm.
Table 1 shows that the total analytic sample consisted of 2,191 participants with ADAS-Cog assessments. The average follow-up time was 10.77 (SD = 3.34) weeks. The average participant age at baseline was 72.42 (SD = 7.46). There were 1,339 (61.11%) females, and 852 (38.89%) males. The placebo group consisted of 760 (34.69%) participants, and the donepezil 1,431 (65.31%).
Abbreviations: ADAS-COG, Alzheimer’s disease assessment scale–cognitive subscale; M, mean; SD, standard deviation.
Latent class mixed model
The Bayesian information criterion was examined to identify the number of latent classes (see eTable 2). The best-fitting model consisted of three classes and a quadratic week term (Supplement eTable 2). Figure 1 shows that the classes consisted of trajectories of low scorers (i.e., less severe cognitive impairment; N = 1,666, 76.04%), improvers (N = 27, 1.23%), and high scorers (i.e., more severe cognitive impairment; N = 498, 22.73%). Table 1 shows the characteristics of each class. From baseline to the last visit, low scorers increased by approximately 1.46 ADAS-Cog points, improvers by 16.54 points, whereas high scorers dropped by −1.39 points (Table 1).
Logistic regression modeling
Next, we used logistic regression models to predict latent class mixed model membership (eTable 3). The results showed that trial participation (except [Reference Burns, Rossor, Hecker, Gauthier, Petit and Moller27]) was significantly associated with low or high scorers, although unrelated to membership in the improvers class. Older age was associated with membership in the low scorer group (OR = 1.02, 95% CI = 1.01). Donepezil compared with placebo was statistically significantly associated with a greater likelihood of membership in the improvers group (OR = 6.88, 95% CI = 2.03, 42.95). Advanced age (OR = 0.98, 95% CI = 0.96, 0.99) and donepezil compared to placebo (OR = 0.79, 95% CI = 0.64, 0.98) were significantly inversely associated with a greater likelihood membership in the group of higher scorers. Consistently, sex had a null effect on class membership.
We replicated the primary analysis as exactly above, but separately for patients randomized to donepezil and placebo. Based on information fit indices, the donepezil group consisted of three classes identifiable from eTable 2 as low scorers (N = 1,078, 75.33%), improvers (N = 21, 1.47%), and high scorers (N = 332, 23.20%). The placebo group consisted of two classes, who were low scorers (N = 585, 76.97%) or high scorers (N = 175, 23.03%) (Table 2). The class courses are shown in Figure 2.
Abbreviations: ADAS-COG, Alzheimer’s disease assessment scale–cognitive subscale; M, mean; SD, standard deviation.
We fitted binary logistic regression models to predict class membership like the primary analysis. We restricted the analysis to the group allocated to donepezil and then placebo, and did not include the treatment term in the model (eTable 3). The trial covariate was statistically significantly (p < 0.05) associated with the likelihood of membership in the classes of low and high scorers in the donepezil group analysis of, but not improvers (eTable 3). Trial had null effects in the sensitivity analysis restricted to the placebo group (eTable 3). Trial had null effects in the analyses restricted to the placebo group. In the placebo analysis, advanced age was positively associated with low scorers (OR = 1.04, 95% CI = 1.02, 1.06) membership and negatively associated with high scorer membership (OR = 0.96, 95% CI = 0.94, 0.98). Age was associated with membership in the high scorers in the donepezil analysis (OR = 0.98, 95% CI = 0.97, 1.00).
Based on individual participant data from five randomized clinical trials of donepezil, we aimed to quantify the extent heterogeneity of cognitive impairment in Alzheimer’s disease. The results empirically identified classes of most were low scorers (N = 1,666, 76.04%) characterized by the worst cognitive impairment, improvers (N = 27, 1.23%), and high scorers (N = 498, 22.73%). Also, we examined markers associated with group membership.
A small group of study participants (1.23%), mostly randomized to donepezil, assumed a pattern consistent with amelioration as reflected by the clinically relevant improvement in cognition (i.e., a four-point improvement on the ADAS-COG) within 12 weeks [Reference Rockwood, Fay, Gorman, Carver and Graham31]. Membership in this class was associated with donepezil rather than placebo treatment only. The lack of significant markers associated with the class of improvers suggests that concerted efforts are warranted to identify other factors associated with the likelihood of amelioration.
We interpret the results in terms of annual progression rates by converting the ADAS change scores at week 12 to annual rates by multiplying them by 52/12 (4.3). This is done to compare the observed changes in the study to estimates elsewhere [Reference Samtani, Xu, Russu, Adedokun, Lu and Ito34]. Also, in practice, as the model includes a quadratic term it is difficult to extrapolate. Accordingly, annual rates will be presented here, which assume a constant annual rate of change. As seen in Table 1, the low scorer group had a change score of 1.46, which confers to a crude estimated annual rate of 6.28 (95% CI = 5.43, 7.12). This estimate falls into the range of the expected disease progression rate of 5.5 ADAS-Cog points per year [Reference Ito, Ahadieh, Corrigan, French, Fullerton and Tensfeldt32]. This is unlike the classes of high scorers (−5.98, 95% CI = −8.06, −3.89). Over 12 weeks, the improvers group had an average change score of 16.54 (95% CI = 15.12, 17.96). It is unlikely that such a rapid change extends to a year, but considerable improvement may occur for a subgroup in the population, which warrants future research. Hence the point estimates for the high scorers and improvers are inconsistent with the standard Alzheimer’s disease progression model [Reference Ito, Ahadieh, Corrigan, French, Fullerton and Tensfeldt32]. This albeit crude interpretation underscores the importance of understanding heterogeneity in Alzheimer’s Disease.
The results showed that select profiles were associated with group membership. In the primary analysis, the trial was associated with the low scorer and high scorer classes. However, in sensitivity analysis, this effect was replicated in the donepezil and not placebo group. This suggests that across trials, heterogeneity is a challenge to the treatment and less the placebo arm [Reference Devi and Scheltens35]. Similarly, younger age was associated with membership in the higher scorer class in the primary analysis and analysis restricted to the donepezil group, but not the placebo group. These results are in-line with prior research on age [Reference Bernick, Cummings, Raman, Sun and Aisen10]. Hence the results illustrate age and trial play a role in heterogeneity.
Sex had a null association with class membership across all models. This is consistent with prior observations that sex appears not to play a role in the efficacy of Alzheimer’s Disease [Reference Canevelli, Quarata, Remiddi, Lucchini, Lacorte and Vanacore11]. Nonetheless, because of the sex distribution in Alzheimer’s disease , further consideration of this issue is warranted.
Limitations and conclusions
There are several limitations to our study. First, as the results are based on clinical trial data with inclusion criteria, they may have restricted generalizability. Evidence indicates that clinical trial selection criteria restrict generalizations from clinical trial data to the general population [Reference Malmivaara36,Reference Canevelli, Bruno, Vanacore, de Lena and Cesari37]. Accordingly, caution is warranted regarding the generalizability of the current results to clinical treatment settings. To inform clinical practice, replicating the results in large-scale naturalistic studies with more extended periods of observation may be appropriate. Second, the trials had unequal assessment intervals and were not designed to assess heterogeneity in the trajectories of long-term cognitive decline (eTable 1). Had they been, possibly different results would have been forthcoming. Third, some factors could be associated with the profiles beyond those we examined (e.g., years of education). Unfortunately, the data common to all the trials did not contain such other information. Hence, our study suffers from residual confounding, and future research may wish to examine more potential predictors of heterogeneity. Our results are restricted to donepezil and placebo. Research is warranted to examine the generalizability of these findings to other antidementia drugs. Fourth, the study duration was restricted to 12 weeks of follow-up. Given the course of cognitive decline in Alzheimer’s disease, research is warranted with longer study durations.
Fifth, we accounted for the trial as a covariate in the statistical analyses since the study data came from five randomized clinical trials. The trials had different visit schedules, follow-up intervals, and selection criteria (eTable 1); hence consideration is warranted regarding trial design [23–27]. The trial covariate was statistically significantly associated with membership in the high and low but not improvement class (eTable 3). Hence, although the trial was accounted for as a covariate, and using more trials means more variability, increasing generalizability, consideration is warranted given our use of multiple trial designs.
Sixth, we used latent class analysis to identify heterogeneity in the course of cognition. The purpose of using this method was to scrutinize how trajectories in cognition unfold with time. Alternative statistical approaches, which do not examine how heterogeneity in cognition unfolds over time, such as machine learning, hold great potency for identifying subgroups in Alzheimer’s disease [Reference Ezzati and Lipton38]. Seventh, multiple other sensitivity analyses could have been computed. For example, had the analysis been conducted sequentially by trial, we would likely introduce excess type II error. In addition, the improvers were a small subgroup and would likely not be uncovered in an analysis by trial. Instead, the improvers were uncovered in the analysis of the donepezil and not placebo. In sum, the large sample based on five trials afforded us the ability to uncover a heterogeneity source in the form of an otherwise hidden subgroup.
Eight, the class of improvers in the results is small, which limits the clinical impact of our results. It is, however, not uncommon or negligible that small groups have disproportional impacts. Many examples exist of when a small segment of the population has a disproportional impact. These include the disproportionately high global burden of schizophrenia [Reference Charlson, Ferrari, Santomauro, Diminic, Stockings and Scott39], and evidence that 80% of the health burden is attributable to 20% of cases [Reference Caspi, Houts, Belsky, Harrington, Hogan and Ramrakha40].
Among the strengths of the current study design are five pivotal clinical trials and many participants, making the results robust. This feature reinforces our faith in the robustness of the analysis. Clinically, the results identify three courses in Alzheimer’s Disease based on ADAS-Cog scores over 12 weeks. Low scorers (76.04%) whose rate of ADAS-Cog progressive decline resembles the average rate of decline and who are characterized by placebo treatment and younger age, improvers (1.23%) who had a marked ADAS-Cog amelioration, and high scorers (22.73%) characterized by advanced age. Clinical trial and age were associated with class membership in the donepezil arm. This suggests that clinical trial designs of Alzheimer’s Disease may be required to reduce trial heterogeneity by being more targeted, at the expense of generalizability. Based on a state-of-the-art statistical analysis of five pivotal clinical trials of Donepezil for Alzheimer’s disease, the current study contributes to the literature by documenting the extent and profiles of heterogeneity in Alzheimer’s Disease under placebo or donepezil for up to 24 weeks.
Joint first authors: Goldberg and Levine. We acknowledge Eisai Co., Ltd. for providing us with the study data. Eisai Co., Ltd. did not provide study design, critical input, or manuscript review for the study. We acknowledge http://www.clinicalstudydatarequest.com for hosting the study data.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. Cipriani is supported by the National Institute for Health Research (NIHR) Oxford Cognitive Health Clinical Research Facility, by an NIHR Research Professorship (grant RP-2017-08-ST2–006), by the NIHR Oxford and Thames Valley Applied Research Collaboration, and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005). The views expressed are those of the authors and not necessarily those of the UK National Health Service, the NIHR, or the UK Department of Health.
Conflict of Interests
Drs. Levine, Yoshida, Samara, and Goldberg have nothing to disclose. Dr. Iwatsubo: has served as a consultant of Eisai, Roche, and Biogen in the last 3 years. Dr. Cipriani: has received research and consultancy fees from INCiPiT (Italian Network for Pediatric Trials), CARIPLO Foundation, and Angelini Pharma. Dr. Leucht has received honoraria as a consultant or for lectures for LB Pharma, Otsuka, Lundbeck, Boehringer Ingelheim, LTS Lohmann, Janssen, Johnson&Johnson, TEVA, MSD, Sandoz, SanofiAventis, Angelini, Sunovion, Recordati, and Geodon Richter. Dr. Furukawa reports personal fees from Mitsubishi-Tanabe, MSD, and Shionogi, and a grant from Mitsubishi-Tanabe, outside the submitted work; TAF has a patent 2018–177688 pending.
Furukawa: Critical manuscript feedback, statistical review, study conceptualization, mentorship.
Levine: Manuscript drafting, statistical analysis, data management, study conceptualization.
Yoshida: Critical manuscript feedback, data management, statistical analysis.
Goldberg: Critical manuscript feedback, statistical analysis, study conceptualization.
Samara: Study conceptualization, interpretation, critical manuscript feedback.
Cipriani: Study conceptualization, interpretation, critical manuscript feedback.
Iwatsubo: Study conceptualization, interpretation, critical manuscript feedback.
Leucht: Study conceptualization, interpretation, critical manuscript feedback.
Data Availability Statement
Data are available based on a request to http://www.clinicalstudydatarequest.com.
To view supplementary material for this article, please visit http://dx.doi.org/10.1192/j.eurpsy.2021.8.