It has been increasingly recognized that major depressive disorder is more often than not a chronic and relapsing disorder (Furukawa et al., Reference Furukawa, Konno, Morinobu, Harai, Kitamura and Takahashi2000; Kanai et al., Reference Kanai, Takeuchi, Furukawa, Yoshimura, Imaizumi, Kitamura and Takahashi2003; Furukawa et al., Reference Furukawa, Yoshimura, Harai, Imaizumi, Takeuchi, Kitamura and Takahashi2009). As a result there have been many attempts to determine the characteristics of the patients who do not respond to treatment (Bagby et al., Reference Bagby, Ryder and Cristi2002).
However, in the real world, the illness course of major depression is highly variable and a substantial minority of the patients do show complete remission from a major depressive episode (Kessler et al., Reference Kessler, van Loo, Wardenaar, Bossarte, Brenner, Ebert, de Jonge, Nierenberg, Rosellini, Sampson, Schoevers, Wilcox and Zaslavsky2017). It will help the practicing clinicians a great deal if they know the baseline demographic and clinical characteristics of such patients and if they can indeed discern such patients with satisfactory confidence at an early stage of the treatment. Remission to a completely euthymic state, rather than response and improvement of the depression severity, has now been proposed to be a desirable and achievable goal of the treatment of patients with major depression (Nierenberg and Wright, Reference Nierenberg and Wright1999; Keller, Reference Keller2003; Nierenberg, Reference Nierenberg2013).
Studies of the course of major depression have identified, although often inconsistently, the following demographic, clinical or psychosocial predictors of poor response: older age, unemployment, low education, unmarried status, high baseline severity, longer duration of episode, greater number of previous episodes, younger age of onset, comorbid personality disorder, comorbid anxiety disorder, comorbid substance use disorder, poor social support and poor physical functioning among others (Bagby et al., Reference Bagby, Ryder and Cristi2002; Kessler et al., Reference Kessler, van Loo, Wardenaar, Bossarte, Brenner, Ebert, de Jonge, Nierenberg, Rosellini, Sampson, Schoevers, Wilcox and Zaslavsky2017). Observations early in the course of the treatment can also be informative: early improvement within 1–3 weeks of treatment has been found repeatedly to be a predictor of good outcomes (Katz et al., Reference Katz, Tekell, Bowden, Brannan, Houston, Berman and Frazer2004; Henkel et al., Reference Henkel, Seemuller, Obermeier, Adli, Bauer, Mundt, Brieger, Laux, Bender, Heuser, Zeiler, Gaebel, Mayr, Moller and Riedel2009; Szegedi et al., Reference Szegedi, Jansen, van Willigenburg, van der Meulen, Stassen and Thase2009; Tadic et al., Reference Tadic, Helmreich, Mergl, Hautzinger, Kohnen, Henkel and Hegerl2010).
However, it is not known if any combination of these factors has enough discriminatory power to be used in clinical practices: even when each predictor is statistically significant, if the positive predictive value (PPV) of the positive predictions is, for example, 50% or even lower, then such a prediction model cannot be used in the clinical practice. Unfortunately, as far as the current authors are aware, there has been no study which has built, and examined the performance of, a prediction model of remission using appropriate psychometric methodology. There are a number of salient weaknesses in the available literature. First, most if not all studies suffer from a substantial loss to follow-up and these dropouts are often simply ignored in the complete case analyses or handled inappropriately with the last-observation-carried-forward method (Little and Rubin, Reference Little and Rubin2002). It is very important in the studies of disease prognosis to limit the loss to follow-up as much as possible and, in the case of unavoidable missing data, to use appropriate imputation methods such as multiple imputation (MI) (Sterne et al., Reference Sterne, White, Carlin, Spratt, Royston, Kenward, Wood and Carpenter2009). Second, science of prediction has seen much advance and refinement in the past decade so that we now have a consensus methodology to appropriately design the study, collect the data, analyse the dataset and report the results (Collins et al., Reference Collins, Reitsma, Altman and Moons2015a; Debray et al., Reference Debray, Damen, Snell, Ensor, Hooft, Reitsma, Riley and Moons2017). We now have growing consensus that the model must be developed using the multivariable analyses and that it must be examined for external validation. Such properly developed prediction models are expected to play greater roles in informing decision making at various stages in the clinical pathway (Rabar et al., Reference Rabar, Lau, O'Flynn, Li and Barry2012; Goff et al., Reference Goff, Lloyd-Jones, Bennett, Coady, D'Agostino, Gibbons, Greenland, Lackland, Levy, O'Donnell, Robinson, Schwartz, Shero, Smith, Sorlie, Stone, Wilson, Jordan, Nevo, Wnek, Anderson, Halperin, Albert, Bozkurt, Brindis, Curtis, DeMets, Hochman, Kovacs, Ohman, Pressler, Sellke, Shen, Smith and Tomaselli2014).
We have conducted a pragmatic megatrial examining the first- and second-line treatments for untreated non-psychotic major depression that involved 2011 patients and followed them up to 25 weeks with the follow-up rate of 95.0%. This study is a secondary analysis of this dataset to delineate the demographic and clinical characteristics of the remitters to acute phase antidepressant treatment and to examine if and how we can predict them based on such variables. The prediction model will be built and examined using the recommended methodology.
Study and the participants
SUN☺D is a 25-week, multi-centre, parallel-group, assessor-blinded, pragmatic megatrial. The details of the study procedure and the results are reported elsewhere (Furukawa et al., Reference Furukawa, Akechi, Shimodera, Yamada, Miki, Watanabe, Inagaki and Yonemoto2011; Kato et al., Reference Kato, Furukawa, Mantani, Kurata, Kubouchi, Hirota, Sato, Sugishita, Chino, Itoh, Ikeda, Shinagawa, Kondo, Okamoto, Fujita, Suga, Yasumoto, Tsujino, Inoue, Fujise, Akechi, Yamada, Shimodera, Watanabe, Inagaki, Miki, Ogawa, Takeshima, Hayasaka, Tajika, Shinohara, Yonemoto, Tanaka, Zhou and Guyatt2018). In brief, it involved two randomizations: the first was a cluster-randomization by site at week 1 between the initial strategy to titrate the first-line treatment with sertraline up to the minimum or the maximum of the licensed dosage. The second was an individual randomization to allocate the participants who had not remitted by week 3 to continue sertraline, to augment it with mirtazapine, or to switch it to mirtazapine. The primary outcome was the score of the Patient Health Questionnaire-9 (PHQ-9) at week 9.
This study is a secondary analysis of the course of the patients participating in the SUND pragmatic trial. In this study, we focus on those who show complete remission after the acute phase treatment.
Participants were eligible when (i) they suffered from non-psychotic unipolar major depression according to DSM-IV in the past month as ascertained by the clinician with the use of the semi-structured interview, Primary Care Evaluation of Mental Disorders (PRIME-MD) (Spitzer et al., Reference Spitzer, Williams, Kroenke, Linzer, deGruy, Hahn, Brody and Johnson1994), (ii) of either sex, aged between 25 and 75, (iii) had not been treated with antidepressants, antipsychotics or mood stabilizers in the past month, (iv) were deemed suitable to start the treatment with sertraline by the clinician, and (v) had provided informed consent. Exclusion criteria included comorbidity with psychotic disorders, personality disorders and substance dependence. More details of the inclusion as well as exclusion criteria are provided in the protocol (Furukawa et al., Reference Furukawa, Akechi, Shimodera, Yamada, Miki, Watanabe, Inagaki and Yonemoto2011).
The first-line treatment consisted of sertraline started with 25 mg/day, then titrated to either 50 or 100 mg/day, according to the cluster randomization by site, by week 3. Those who remitted by week 3 (defined as scoring 4 or less on PHQ-9 at week 3) continued with their allocated first-line treatment. Those who had not remitted were randomized 1:1:1 to continue sertraline, to augment sertraline with mirtazapine, or to switch to mirtazapine at week 3. These second-line treatments were continued up to week 9. After week 9, the treatment was at discretion by the physicians and the final assessment was made at week 25.
Co-administration of non-protocol antidepressants, antipsychotics or mood stabilizers was prohibited up to week 9; anxiolytics and hypnotics were permitted. After week 9, the treatments were at the study physicians’ discretion and there were no prohibited treatments.
Before entry to the study, the physicians gathered information about the baseline demographic as well as clinical characteristics of the patients. After entry into the study, trained interviewers assessed the participants with the PHQ-9 and Frequency, Intensity, and Burden of Side Effects Rating (FIBSER) by telephone at weeks 1, 3, 9 and 25. The inter-rater reliability of the assessors as well as the success of blinding of the assessors have been ascertained (Shimodera et al., Reference Shimodera, Kato, Sato, Miki, Shinagawa, Kondo, Fujita, Morokuma, Ikeda, Akechi, Watanabe, Yamada, Inagaki, Yonemoto and Furukawa2012; Kato et al., Reference Kato, Furukawa, Mantani, Kurata, Kubouchi, Hirota, Sato, Sugishita, Chino, Itoh, Ikeda, Shinagawa, Kondo, Okamoto, Fujita, Suga, Yasumoto, Tsujino, Inoue, Fujise, Akechi, Yamada, Shimodera, Watanabe, Inagaki, Miki, Ogawa, Takeshima, Hayasaka, Tajika, Shinohara, Yonemoto, Tanaka, Zhou and Guyatt2018).
Patient Health Questionnaire-9
PHQ-9 consists of the nine diagnostic criteria items of a major depressive episode of the DSM-IV. Each item is rated between 0 = ‘Not at all’ through 3 = ‘Nearly every day’, and the total score ranges between 0 and 27. The scores are interpreted clinically (Kroenke et al., Reference Kroenke, Spitzer and Williams2001) as indicating
0–4: No depression
5–9: Mild depression
10–14: Moderate depression
15–19: Moderately severe depression
20–: Severe depression
Good reliability, validity as well as sensitivity to change have been documented (Furukawa, Reference Furukawa2010).
Beck Depression Inventory, Second edition
The participants were also asked to fill in the Beck Depression Inventory, second edition (BDI-II) on a bi-weekly basis when they visited the clinicians. BDI-II is a 21-item self-report measure of depression severity. The total score ranges between 0 and 63. Two subscales based on ‘cognitive’ and ‘non-cognitive’ factors have been proposed (Beck et al., Reference Beck, Steer and Brown1996). Excellent reliability, validity as well as sensitivity to change have been reported (Furukawa, Reference Furukawa2010).
Frequency, Intensity, and Burden of Side Effects Rating
FIBSER was originally used in a large NIMH-funded depression trial as a global rating scale for side effects which assesses the frequency, intensity and burden of side effects, each on a seven-point scale between 1 and 7. The total score therefore ranges between 3 and 21, with higher ratings indicating greater severity (Rush et al., Reference Rush, Trivedi, Wisniewski, Stewart, Nierenberg, Thase, Ritz, Biggs, Warden, Luther, Shores-Wilson, Niederehe and Fava2006).
Adherence was measured as the number of days that the patient reported having taken the study medication.
We defined remitters as scoring 4 or less on PHQ-9, which was the primary outcome measure in the original megatrial. We first compared remitters v. non-remitters at week 9 with regard to the baseline demographic as well as clinical characteristics. Missing data were imputed by way of MI, using chained equations under the assumption that data were missing at random. Fifty multiply imputed datasets were created, using sex, age, education, employment, marital status, age of onset for depression, number of depressive episodes, length of index episode, PHQ-9, BDI-II, BDI-II subscales (Beck et al., Reference Beck, Steer and Brown1996), FIBSER and adherence as predictors. Rubin's rules were used to pool the regression coefficient estimates from the imputed datasets (Rubin, Reference Rubin1987). The association was expressed as odds ratio (OR) and its 95% confidence intervals (CIs).
Secondly, we examined whether we could predict remitters at week 9 by entering all the predictors into one model. As predictors, we used three successively richer sets, namely (i) all the demographic and clinical variables at baseline as listed above, (ii) plus the clinical variables by week 1 and treatment allocation at week 1, and (iii) plus the clinical variables by week 3 and treatment allocation at week 3. We then used the manual MI-stepwise logistic regression with backward selection method with p to leave set at 0.10 (Wood et al., Reference Wood, White and Royston2008; Chen and Wang, Reference Chen and Wang2013), while also considering the clinical importance, clinical convenience and collinearity. We calculated variance inflation factors (VIFs) in order to ascertain that the obtained models did not suffer from multi-collinearity.
In order to avoid overfitting the data to the sample and to ascertain the external validity of the prediction model thus obtained, we split the sample by the median date of enrolment (temporal validation) (Collins et al., Reference Collins, Reitsma, Altman and Moons2015a). We used the first half of the total cohort as the derivation set to build the prediction model, and then examined its prediction performance on the second half of the cohort as a validation set.
Because the clinical focus of prediction was to see if the screening-positive population would eventually turn out to be true remitters, in building and assessing the prediction model, we tried to maximize the PPV, while not unduly sacrificing the total number of screening-positive population (assessed by the sensitivity of the prediction model) or the overall discrimination [assessed by the area under the receiver-operating characteristics curve (AUC)] and calibration (assessed by calibration plots and goodness-of-fit statistics).
In the validation sample, we examined AUC, goodness-of-fit statistics, calibration plots, PPV, negative predictive value (NPV), sensitivity and specificity of the prediction model against the remitters at week 9 as well as those at week 25.
We conducted all statistical analyses with STATA Version 15.1 (College Station, TX, USA).
Participants, interventions and assessments
Figure 1 shows the screening, randomization and follow-up of the study participants. Between December 2010 and March 2015, 56261 first-visit patients to the participating 48 clinics and hospitals in Japan underwent eligibility assessment, of whom 7895 suffered from untreated unipolar major depressive episodes. Of these, 2011 patients satisfied eligibility criteria, provided informed consent and were enrolled into SUND.
At week 1, 970 participants were allocated to the 50 mg/day and 1041 to the 100 mg/day arms by cluster randomization. In the 50 mg/day arm, 91.7% had been prescribed 50 mg/day, 0.1% 37.5 mg/day, 1.3% 25 mg/day and 0.1% 75 mg/day by week 3; in the 100 mg/day arm, 82.0% had reached 100 mg/day, 5.3% 75 mg/day, 6.7% 50 mg/day and 0.9% 25 mg/day. In the 50 mg/day arm, 6.8% had stopped treatment as had 5.1% in the 100 mg/day arm.
Of all enrolled patients, 1953 (97.1%) completed telephone assessment at week 3, at which point 230 had remitted and continued on their allocated sertraline dose. Of those who had not remitted, 551 were randomized to continue sertraline (n = 551), augment sertraline with mirtazapine (n = 538) or switch to mirtazapine (n = 558). Of the initial cohort randomized at week 1, 1927 (95.8%) and 1910 (95.0%) were successfully followed-up at weeks 9 and 25, respectively.
In total, 37.0% (95% CI 34.8–39.1%) of the original cohort were remitted at week 9.
Table 1 shows the ORs for the association between the baseline predictors and the remission status at week 9. Older age, longer education, married status, older age at onset, shorter length of index episode as well as lower depression severity at weeks 0, 1 and 3 and less adverse effects at weeks 1 and 3 were significantly associated with remission.
Numbers are mean (95% confidence interval) or percentage (95% confidence interval).
BDI-II, Beck Depression Inventory, second edition; FIBSER, Frequency, Intensity, and Burden of Side Effects Rating; PHQ-9, Patient Health Questionnaire-9.
Prediction models in the derivation set
We next constructed prediction models applying MI-stepwise logistic regression to the derivation set (n = 1009).
The prediction model based on the baseline, week 0 data only did not perform satisfactorily (Table 2a). The model included age, education, length of index episode and depression severity at baseline but the overall AUC was only 0.69. Even when the cut-off post-test probability for positive prediction was set at 0.70, PPV was 0.67; moreover, only 1% of the true remitters at week 9 were predicted to be so at baseline.
AUC = 0.66, PPV = 0.67 at cut-off post-test probability of 0.70 with sensitivity of 0.01 (i.e. with 1% of the final remitters correctly identified).
AUC = 0.75, PPV = 0.80 at cut-off post-test probability of 0.70 with sensitivity of 0.16 (i.e. 16% of the final remitters correctly identified).
AUC = 0.85, PPV = 0.83 at cut-off post-test probability of 0.70 with sensitivity of 0.40 (i.e. with 40% of the final remitters correctly identified).
AUC, area under the curve; BDI-II, Beck Depression Inventory, second edition; FIBSER, Frequency, Intensity, and Burden of Side Effects Rating; PHQ-9, Patient Health Questionnaire-9; PPV, positive predictive value; VIF, variance inflation factor (95% CI in parentheses).
The prediction model based on data up to week 1 performed better (Table 2b). The final model included age, education, length of index episode, depression severity at baseline and at week 1, and the total burden of side effects at week 1. The overall AUC was 0.75, and PPV reached 0.80 when the cut-off post-test probability was set at 0.70. However, it was possible to identify only 16% of the true remitters as such.
When we included data up to week 3, the prediction performance improved further (Table 2c). The final model included age, education, length of index episode as before but now only depression severity measures at week 3. The AUC was now 0.85; at the cut-off post-test probability of 0.70, PPV was 0.83, allowing 40% of the final remitters to be identified.
None of the VIFs in the three models was >5, suggesting that the obtained models did not present with a problem in multi-collinearity.
External validation of the prediction model in the validation set
Only the models using data up to week 1 or to week 3 were tested for external validity (Table 3). When these two prediction models were applied to the validation set, discrimination was still satisfactory, with AUC of 0.73 (95% CI 0.70–0.77) and 0.82 (0.79–0.85) for the models up to week 1 and up to week 3, respectively. Figure 2 shows calibration plots for the two models: the predicted and the observed matched closely, with no statistically significant Hosmer–Lemeshow statistics (p = 0.41 and p = 0.29, respectively) for this large validation set (n = 1002).
AUC, area under the curve; NPV, negative predictive value; PPV, positive predictive value (95% CI in parentheses).
Setting the threshold for positive prediction at 0.70, the models showed similar performance to predict remitters as in the derivation set. Using the data up to week 1, PPV (i.e. proportion of true remitters among the positively predicted) remained at 0.74 (0.64–0.83); using the data up to week 3, PPV improved to 0.83 (0.76–0.88). The sensitivity (i.e. proportion of positive prediction among the true remitters) was 0.17 (0.13–0.21) and 0.36 (0.31–0.42).
The models can be used to predict non-remitters as well. At the cut-off post-test probability of 0.30, the model with data up to week 1 showed NPV (i.e. proportion of true non-remitters among the negatively predicted) of 0.81 (0.77–0.85); that with data up to week 3 NPV of 0.84 (0.81–0.87). The specificity (i.e. proportion of negative prediction among the true non-remitters) was 0.55 (0.51–0.59) and 0.68 (0.65–0.72).
It is interesting to note that, using the same prediction model, we can predict the remitters at week 25 (remission rate: 51.8%, 95% CI 49.5–54.0%) with similar accuracy, with AUC of 0.69 (0.66–0.72) and 0.75 (0.72–0.78) for the models up to week 1 and up to week 3, respectively. The PPV for remission at week 25 based on the data up to week 1 was 0.87, and that based on data up to week 3 was 0.86. In other words, if the predictions based on age, education, length of episode and depression severity were positive, we can be fairly confident that such patients would remit by week 9 or, at least eventually, by week 25. The calibration for predicting week 25 remission was poor, mainly because more participants reached remission at week 25 than predicted by the models predicting remission at week 9 (eTable 1 and eFigure 1 in the online Supplementary material).
The prediction models
The final prediction models based on data up to week 1 and on data up to week 3 were as follows:
• logit by week 1 data = −0.841 + 0.059 × PHQ9 at week 0 + 0.028 × age + 0.087 × education(years) − 0.045 × length of episode(months) − 0.076 × PHQ9 at week 1 − 0.056 × BDI2 at week 1 − 0.056 × FIBSER at week 1
• logit by week 3 data = 0343 + 0.029 × age + 0.080 × education(years) − 0.037 × length of episode(months) − 0.176 × PHQ9 at week 3 − 0.061 × BDI2 at week 3
where post-test probability is obtained by exp(logit)/[1 + exp(logit)]. The Excel spreadsheet to calculate the post-test probability is provided on our department homepage at http://ebmh.med.kyoto-u.ac.jp/toolbox.html and also attached to this article as an electronic supplement.
The biggest inception cohort to date to study the outcome of patients undergoing antidepressant therapy for an untreated episode of major depression revealed that, if we include observations up to week 1 or 3 after commencement of therapy, we can have reasonably satisfactory and usable prediction models to predict remission at the end of acute phase treatment. The same models were able to predict remission after 25 weeks as well.
Older age, higher education, married status, shorter duration of episode, older age of onset and milder initial depression severity were associated with remission in our univariable analyses. Older age has often been associated with poorer prognosis (Bagby et al., Reference Bagby, Ryder and Cristi2002; Kessler et al., Reference Kessler, van Loo, Wardenaar, Bossarte, Brenner, Ebert, de Jonge, Nierenberg, Rosellini, Sampson, Schoevers, Wilcox and Zaslavsky2017); in our cohort of patients with major depression without comorbidities, older age predicted better prognosis. When built into a multivariable prediction model, age, education and length of episode along with depression severity emerged as independent predictors. In other words, marital status and age of onset may have been confounded by these factors.
The performance of the prediction models improved when we included depression severity in the early course of treatment. The added predictive value of depression severity in the first 1–3 weeks of treatment is in line with the literature (Katz et al., Reference Katz, Tekell, Bowden, Brannan, Houston, Berman and Frazer2004; Henkel et al., Reference Henkel, Seemuller, Obermeier, Adli, Bauer, Mundt, Brieger, Laux, Bender, Heuser, Zeiler, Gaebel, Mayr, Moller and Riedel2009; Szegedi et al., Reference Szegedi, Jansen, van Willigenburg, van der Meulen, Stassen and Thase2009; Tadic et al., Reference Tadic, Helmreich, Mergl, Hautzinger, Kohnen, Henkel and Hegerl2010). Indeed only the models incorporating data up to week 1 or 3 demonstrated satisfactory performance in the development set. In the external validation set, the discrimination of these models was good with AUC between 0.73 and 0.82 and the calibration was excellent as shown in the calibration plots (Fig. 2). When the model prediction is positive after 1–3 weeks of initial treatment, one can be 70–80% sure that the patient would remit within 9 or, at least, by 25 weeks. Such information will be very encouraging both for the patients and the clinician in the actual practices.
The treatments did not emerge as strong predictors. In the model using data up to week 1, when patients were randomized to either 50 mg or 100 mg/day of sertraline, the treatment allocation did not emerge as a significant predictor. This finding is in line with the results of the original randomized controlled trial (RCT), which found that there was no difference in PHQ-9 scores at week 9 between these two arms (Kato et al., Reference Kato, Furukawa, Mantani, Kurata, Kubouchi, Hirota, Sato, Sugishita, Chino, Itoh, Ikeda, Shinagawa, Kondo, Okamoto, Fujita, Suga, Yasumoto, Tsujino, Inoue, Fujise, Akechi, Yamada, Shimodera, Watanabe, Inagaki, Miki, Ogawa, Takeshima, Hayasaka, Tajika, Shinohara, Yonemoto, Tanaka, Zhou and Guyatt2018). In the model using data up to week 3, when non-remitted patients were randomized to continue sertraline, augment it with mirtazapine or switch to mirtazapine, the original RCT found small but statistically significant superiority of the augmentation or switching strategies over the continuation in terms of the PHQ-9 scores at week 9 among the non-remitters (Kato et al., Reference Kato, Furukawa, Mantani, Kurata, Kubouchi, Hirota, Sato, Sugishita, Chino, Itoh, Ikeda, Shinagawa, Kondo, Okamoto, Fujita, Suga, Yasumoto, Tsujino, Inoue, Fujise, Akechi, Yamada, Shimodera, Watanabe, Inagaki, Miki, Ogawa, Takeshima, Hayasaka, Tajika, Shinohara, Yonemoto, Tanaka, Zhou and Guyatt2018). In the current analyses, augmentation or switching emerged as significant predictors in initial steps of variable selection: however, when PHQ-9 scores at week 3 were included, they were no longer statistically significant. In other words, PHQ-9 at week 3 was a stronger predictor of remission at week 9 over changing the treatments among the non-remitters.
The study has some limitations. First, although the model showed good overall discrimination and satisfactory PPV when the post-test threshold of positive prediction was set at 0.70, it was only able to identify a minority (30–40%) of the actual remitters. This limitation is well illustrated by Fig. 2: the probability of accurate prediction is high to the right of the 7th decile; however, there are always patients who are less likely to remit but still do remit to the left of the 7th decile. The users of the model using this threshold need to be aware that there are patients who still remit even when they are negatively predicted below this threshold. Second, we were unable to examine variables that were used as exclusion criteria or not measured originally in the SUND trial. Among such were personality disorders, substance use disorders, anxiety disorders, social support and social functioning. The prediction performance may have improved had we measured these variables at the baseline. However, the set of variables in the current study represents the minimum set clinicians would be measuring in daily practices and serve to indicate which variables to look for in the case of non-complicated major depression. Third, the findings would apply to chronic or non-chronic drug-naïve patients without significant comorbidities, and possibly not to treatment-refractory populations or in the context of salient psychiatric or physical comorbidities. We need further research to build prediction models for such difficult-to-treat depression and examine if similar variables would be at play. Fourth, it is not known whether the obtained models will be applicable when treatments other than the ones used in this megatrial are administered. The final models did not include treatment variables. However, when different drugs and different therapies are used, including psychotherapies or physical therapies, different factors might emerge as important predictors. Finally, although using the latter half of the sample as a validation set is considered a form of external validation (Collins et al., Reference Collins, Reitsma, Altman and Moons2015b), the validity coefficients thus obtained could have been higher than using a dataset from completely new settings. Performance of the obtained models need to be assessed with further validation samples from different settings and broader types of participants.
However, this study possesses several unique strengths. This is the largest cohort of patients with hitherto untreated episodes of major depression, treated with step-wise antidepressant pharmacotherapy. The participants were recruited in 48 clinics and hospitals across Japan. The dropout rates were <5% up to week 25, and the appropriate imputation method was applied for the missing data. The sample size allowed two datasets, each comprising approximately 1000 patients, one for derivation of the models and another for external validation of the models. The development of the prediction models followed the most recent guideline and used the one-step multivariable procedure (Collins et al., Reference Collins, Reitsma, Altman and Moons2015a). The performance of the obtained models in the validation set was satisfactory. The study focused on remission, which is clearly the most desirable outcome of acute phase depression treatment (Nierenberg and Wright, Reference Nierenberg and Wright1999; Keller, Reference Keller2003; Nierenberg, Reference Nierenberg2013).
We have provided the whole prediction models as Excel spreadsheets as an online Supplementary material. Patients and clinicians can enter their age, education, length of episode, PHQ-9 and BDI-II scores to obtain predicted probabilities of achieving remission at week 9 and at week 25. We hope that clinically informed, judicious use of this tool will help the patients and clinicians make better informed decisions.
Toshi A. Furukawa 0000-0003-2159-3776
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291718003331
TAF has received lecture fees from Meiji, Mitsubishi-Tanabe, MSD and Pfizer. He has received research support from Mitsubishi-Tanabe. TK has received lecture fees from Eli Lilly and Mitsubishi-Tanabe, and has contracted research with GSK, MSD and Mitsubishi-Tanabe. He has received royalties from Kyowa Yakuhin. YS has received lecture fees from Janssen, Kyowa-Yakuhin, Meiji, MSD, Otsuka and Mitsubishi-Tanabe. KM has received lecture fees from Eisai, GSK, Kyowa Yakuhin, Meiji, MSD, Otsuka, Pfizer, Eli Lilly, Mochida, Yoshitomi, Dainippon-Sumitomo, Takeda and Shionogi. HF has received lecture fees from Mochida and Tsumura. NT has received lecture fees from Astellas, Eisai, Shionogi, Novartis, Fujifilm RI Pharma, Meiji, Mochida, MSD, Janssen, Eli Lilly and Dainippon-Sumitomo. MK has received lecture fees from Yoshitomi and a research grant from Novartis. MI has received a grant from Novartis Pharma. He has received lecture fees from Meiji Seika Pharma, Mochida and Takeda. MY has contracted research with Nippon Chemiphar.
The study was funded by the Ministry of Health, Labor and Welfare, Japan (H-22-Seishin-Ippan-008) from April 2010 through March 2012 to TAF (http://www.mhlw.go.jp/english/), and thereafter by the Japan Foundation for Neuroscience and Mental Health (JFNMH) to TAF (http://www.jfnm.or.jp/). The JFNMH received donations from Asahi Kasei, Eli Lilly, GSK, Janssen, MSD, Meiji, Mochida, Otsuka, Pfizer, Shionogi, Taisho, and Mitsubishi-Tanabe. The study is partly supported by Japan Agency for Medical Research and Development (18dk0307072) and the Ministry of Health, Labour and Welfare (H29-ICT-Ippan-010). The funders of the study had no role in study design, data collection, data analysis, data interpretation or writing of the report.
TAF is the principal investigator and had overall responsibility for the management of the study. TAF had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analyses. TAF conceived and designed this study. TAF and MY obtained the funding. YS, KM, HF, NT, MK, MY, MI and TK acquired, analysed or interpreted the data. TAF conducted the statistical analyses. TAF drafted the manuscript, and YS, KM, HF, NT, MK, MY, MI and TK contributed critical revision of the manuscript. All authors contributed to and approved the final manuscript.