1. Introduction
The age of disease onset is often analyzed to identify patient subgroups that differ in clinical course or genetic profile. Two general approaches to grouping data from patients with bipolar disorder have provided important and replicated findings. The first approach uses a clustering methodology (mixture analysis) to determine the optimal number of distinct subgroups in a sample based on the age of onset distribution [Reference Bellivier, Golmard, Henry, Leboyer and Schürhoff8]. Using this clustering methodology, researchers have identified three onset subgroups, with the youngest subgroup having the most severe course of illness and highest likelihood of a family history of mood disorders [2,8,9,22,24,35,37]. The second approach groups the data in a sample by patient year of birth and analyzes for a birth cohort effect [Reference Gershon, Hamovit, Guroff and Nurnberger21]. Researchers have detected a strong birth cohort effect in bipolar disorder, with successive generations experiencing an earlier age of onset [12,14,15,21,23,33,34].
The purpose of this analysis is to evaluate whether a birth cohort effect influences the results of clustering based on the age of onset using a large international database of patients with bipolar I disorder [Reference Bauer, Glenn, Alda, Andreassen, Angelopoulos and Ardau4]. This is important because the birth cohort may modify the number and composition of subgroups, which in turn may affect the subsequent search for distinct and meaningful clinical and genetic profiles.
2. Methods
2.1. Data collection
The data in this analysis were collected for a study of the impact of solar insolation on the age of onset of bipolar disorder, and are described in detail elsewhere [4,5]. The diagnosis of bipolar disorder was made by a psychiatrist according to DSM-IV criteria. The patient data were obtained retrospectively at 36 collection sites in 23 countries. In 20 sites, data were obtained by a combination of direct interviews and record review, in 8 sites primarily by direct interview and in 8 sites by record review. The age of onset was defined as the first occurrence of an episode of depression, mania or hypomania according to DSM-IV criteria. Additional data included a family history of any mood disorder in a first degree relative, and the polarity of the first episode (depressed, manic or hypomanic). Study approval from institutional review boards was obtained according to local requirements.
Data from 5465 patients with bipolar disorder were obtained from 36 collection sites: Aarhus, Denmark (n = 66); Athens, Greece (n = 51); Bangalore, India (n = 99); Barcelona, Catalonia, Spain (n = 200); Beer Sheva, Israel (n = 105); Buenos Aires, Argentina (n = 95); Cagliari, Sardinia, Italy (n = 206); Calgary, Canada (n = 126); Cape Town, South Africa (n = 100); Dresden, Germany (n = 35); Halifax, Canada (n = 102); Helsinki, Finland (n = 191); Hong Kong (n = 50); Kansas City, KS, USA (n = 21); Kuala Lumpur, Malaysia (n = 121); Los Angeles, CA, USA (n = 206); Medellín, Colombia (n = 189); Melbourne/Geelong, Australia (n = 161); Oslo, Norway (n = 127); Palo Alto, CA, USA (n = 48); Paris, France (n = 468); Porto Alegre, Brazil (n = 205); Poznan, Poland (n = 102); Rochester, MN, USA (n = 141); San Diego, CA, USA (n = 55); São Paulo, Brazil (n = 248); Salvador, Brazil (n = 121); Santiago, Chile (n = 346); Siena, Italy (n = 60); Thessaloniki, Greece (n = 52); Tokyo, Japan (n = 120); Trondheim, Norway (n = 238); Vitoria-Basque Country, Spain (n = 343); Worcester, MA, USA (n = 58); Wiener Neustadt, Austria (n = 253); and Würzburg, Germany (n = 356).
2.2. Database characteristics
Of the 5465 total patients 4037 were diagnosed with bipolar I disorder, 1236 with bipolar II and 192 with bipolar NOS. Due to a large imbalance in the diagnosis of bipolar I disorder at the collection sites, varying from 23% to 99%, only the 4037 patients with a diagnosis of bipolar I disorder were included in this analysis. Of the 4037 patients, 2374 (58.8%) were female and 1663 (41.2%) were male. Onset occurred in the southern hemisphere for 1043 (25.8%) of the patients.
The mean age of the 4037 patients was 48.1 ± 14.5 years. The unadjusted mean age of onset for the 4037 patients was 25.4 years, similar to 25.7 years (n = 1665) in other research [Reference Baldessarini, Tondo, Vazquez, Undurraga, Bolzani and Yildiz3]. Family history was available for 3334 (82.6%) of the 4037 patients. Of the 3334 patients, 1848 (55.4%) had a positive family history and 1486 (44.6%) did not. The polarity of the first episode was available for 3601 (89.2%) of the 4037 patients. Of the 3601 patients, the first episode was depressed in 1748 (48.5%) and manic in 1853 (51.5%).
2.3. Onset location and country median age
This international database has several unique features. Although the data were collected in 36 collection sites in 23 countries, there were 318 unique onset locations (city and country) in 43 countries. Each onset location includes all reported locations within a 1 × 1 degree grid of latitude and longitude. The number of onset locations from each collection site reflects differences in country size, culture and migration patterns. The number of patients within each onset location varies, and the data within each onset location are correlated [4,5].
There is a large difference in the median age of the population among the countries, varying over 20 years between the oldest (Japan, 45.8 years) and the youngest (South Africa, 25.5 years) [48]. For a disease with a variable age of onset that spans several decades like bipolar disorder, an older age of onset would be expected in a country with an older population [13,27]. Additionally, the country median age, which summarizes the age structure, provides information about the socioeconomic characteristics of a country [48].
2.4. Clustering approach
The clustering analysis was performed in two steps. First, generalized estimating equations (GEE) were used to estimate the effect of the country median age and, in some models the birth cohort, on the age of onset. Second, the residuals from the estimated GEE models, which contain information that was not explained by the GEE model variables, were used for the cluster analysis.
2.5. GEE
All GEE models have the age of onset as the dependent variable. A GEE model was used to accommodate both the correlated data and unbalanced number of patients within the onset locations. All estimates adjust for the correlated onset locations using clusters, and the country median age as an independent variable. A GEE uses a population averaged or marginal approach, estimating the effect across the entire population rather than within the correlated onset locations [Reference Zeger and Liang49]. A significance level of 0.01 was used to evaluate estimated coefficients. GEE analyses were performed using geepack 1.1-6 for R.
2.6. Mixture analysis
Mixture analysis was performed using model-based clustering with MCLUST 4.2 for R software [Reference Fraley, Raftery and Scrucca19], as in prior research [Reference Hamshere, Gordon-Smith, Forty, Jones, Caesar and Fraser24]. Model-based clustering assumes the sample is a mixture of one or more normal distributions, uses a statistical probability model to determine both the number and composition of the clusters, and does not specify in advance the number, shape, volume or orientation of the distributions [17–Reference Fraley, Raftery and Scrucca19]. The best fitting model and number of clusters are selected using the Bayesian Information Criteria (BIC), with the smallest BIC being optimal.
Since the results of this study are population-based, a comparison cannot be directly made with the results for an individual country. However, to confirm the methodology using residuals, a comparison was made using data from just one country. Cluster analysis of age of onset was performed without any adjustments, as in prior studies in a single country [Reference Bellivier, Golmard, Rietschel, Schulze, Malafosse and Preisig9]. The results were compared to cluster analysis using the age of onset residuals from the GEE model adjusted only for the correlated onset locations within the country. The mean predicted age of onset was added to the cluster midpoint for comparison. As shown in Table 1, there was no difference in the results. Also, the values were similar to prior findings [2,9].
Table 1 Comparison of results of cluster analysis of age of onset data for France (n = 371) using actual data versus residuals.

The entries in bold demonstrate that using the actual data, and the residuals mean + overall mean age of onset, produced the same result.
a Residuals calculated using generalized estimating equation (GEE) estimate of age of onset as a function of a constant with 28 onset locations within France.
b The overall mean of the estimated GEE age of onset is 24.97 years.
2.7. Impact of birth cohort
A large percentage of 4037 people in this database were born before 1960 (36.8%). As in prior research [Reference Chengappa, Kupfer, Frank, Houck, Grochocinski and Cluss14], three birth cohort groups were created: born before 1940, born between 1940 and 1959, and born after 1959. The impact of the birth cohort effect on the clustering was analyzed in three ways. First, using the entire sample, a GEE model was estimated without considering the birth cohort, and cluster analysis was then performed on the residuals. Second, using the entire sample, a GEE model was estimated that also adjusted for the birth cohort, and cluster analysis was then performed on the residuals. Third, a GEE model without the birth cohort adjustment was estimated for only the youngest cohort born after 1959, and cluster analysis was performed on the residuals.
2.8. Clinical variables
The clinical variables of the patients in the subgroups detected by cluster analysis were compared. Clinical variables in this database were family history, gender and polarity of the first episode. The hypomanic and manic data were combined for analysis of polarity. Variables in the subgroups were compared using a Chi2 test. For variables with a significant difference and more than two subgroups, logistic regression models were used for pairwise comparison.
3. Results
Of the 4037 patients, 220 (5.4%) were born before 1940 and had a mean age of onset of 38.4 years, 1267 (31.4%) were born between 1940 and 1959 and had a mean age of onset of 29.5 years, and 2550 (63.2%) were born after 1959 and had a mean age of onset of 22.2 years. The 16.2 years difference between the mean age of onset in the oldest and youngest birth cohort groups influenced the results of the clustering analysis, as shown in Tables 2A–2C. Without considering the birth cohort, the best fitting model for the entire sample (n = 4037) consisted of three normal distributions. The mean age of three subgroups were 17.24 ± 3.20, 23.93 ± 5.12, and 32.20 ± 11.96 years, representing 41.7%, 24.7%, and 33.6% of the sample (Table 2A). With the birth cohort, the best fitting model for the entire sample (n = 4037) consisted of two normal distributions. The mean age of two subgroups were 20.7 ± 5.84 and 30.1 ± 10.40 years, representing 62.1% and 37.9% of the sample (Table 2B). Considering only those born after 1959 (n = 2550), the best fitting model also consisted of two normal distributions. The mean age of two subgroups were 18.11 ± 3.70 and 25.79 ± 8.41 years, representing 56.9% and 43.1% of the sample (Table 2C).
Table 2A Results of cluster analysis of age of onset data for all patients without birth cohort (n = 4037)a.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant and the country median age with 318 onset locations. The overall mean of the estimated GEE age of onset is 25.38 years.
Table 2B Results of cluster analysis of age of onset data for all patients with birth cohort (n = 4037)a.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant, the country median age and birth cohort group with 318 onset locations. The overall mean of the estimated GEE age of onset is 25.40 years.
Table 2C Results of cluster analysis of age of onset data for patients born after 1959 (n = 2550)a.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant and the country median age with 263 onset locations. The overall mean of the estimated GEE age of onset is 22.22 years.
In all cluster results, more patients in the youngest subgroup had a family history of mood disorders, and a first episode with a depressed polarity (Tables 3A–3C). However, pairwise comparisons of the three subgroups detected without considering the birth cohort, could not distinguish between the middle and oldest subgroups for family history or polarity of first episode. A significant difference in family history and polarity of the first episode was only found when comparing the youngest and middle subgroups, and the youngest and oldest subgroups (Table 4).
Table 3A Patient characteristics in subgroups from cluster analysis of age of onset data without birth cohorta.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant and the country median age with 318 onset locations.
Table 3B Patient characteristics in subgroups from cluster analysis of age of onset data with birth cohorta.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant, the country median age and birth cohort group with 318 onset locations
Table 3C Patient characteristics in subgroups from cluster analysis of age of onset data for patients born after 1959a.

a Modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant and the country median age with 263 onset locations.
Table 4 Pairwise comparison of patient characteristics within subgroups from cluster analysis without birth cohorta.

a Subgroups modeled using residuals from generalized estimating equation (GEE) estimate of age of onset as a function of a constant and the country median age with 318 onset locations. Pairwise comparison using logistic regression.
b Confidence interval.
c Odds ratio.
d Reference category.
4 Discussion
Data in this international study were combined from multiple dissimilar countries, and adjusted for large differences in the country median age. Even with these adjustments, cluster analysis identified three subgroups for the age of onset of bipolar I disorder when the birth cohort is not considered, similar to results from individual countries as summarized by Hamshere et al. [Reference Hamshere, Gordon-Smith, Forty, Jones, Caesar and Fraser24]. This similarity validates the technique used in this analysis, and, in turn, the cluster analysis on the residuals confirms the presence of subgroups. When adjusting for the birth cohort, or considering only those born after 1959, only two subgroups were found. As in prior studies in which data were unadjusted for the birth cohort, the youngest subgroup was more likely to have a family history of mood disorders [3,22,24], and to have a first episode with a polarity of depression [39,40] when compared to either the middle or older subgroup. However, there was no significant difference between the middle and older subgroups for either family history or polarity of first episode suggesting that the two older subgroups may not be clinically distinct. Since the birth cohort adjustment alters the number of subgroups, the usefulness of this confounder should be investigated in future studies.
The birth cohort effect is a proxy for the cultural environment experienced by different generations of patients and their physicians [43,45,47]. In addition to bipolar disorder, a strong birth cohort effect for age of onset was reported for other psychiatric disorders including depression [12,29,32], schizophrenia [Reference Di Maggio, Martinez, Ménard, Petit and Thibaut16], substance abuse [Reference Johnson and Gerstein28], phobias [Reference Magee, Eaton, Wittchen, McGonagle and Kessler36], and symptoms of anxiety [Reference Twenge43]. Diverse cultural influences may contribute to the birth cohort effect including the immediate and long-term consequences of World War II [11,31,45,47], stress under totalitarian regimes [6,7], introduction and expansion of psychopharmacology [20,42], evolving diagnostic practices [1,10], changes in societal attitudes to mental illness [21,26,41], changes to family structure and the role of women [32,43], greater exposure to drugs of abuse [12,15,28], and the rise of the information age and social media.
There are several limitations to this study. The data collection process was not standardized across all sites, although diagnosis was based on DSM-IV criteria. Patient reported age of onset is subject to recall or memory bias especially among the elderly [38,46]. The family history data were not validated. Family history data is often inaccurate [Reference Hardt and Franke25], and may be influenced by cultural attitudes towards mental illness [Reference Karasz30]. A genetic anticipation effect may be contributing in part to the birth cohort effect [Reference Visscher, Yazdi, Jackson, Schalling, Lindblad and Yuan44]. Ascertainment bias may be present, since patients with bipolar disorder may recognize symptoms in offspring, resulting in earlier diagnosis. There could also be a selection bias in the age of onset for those born before 1959, since a younger age of onset is associated with a more severe disease course including suicide [Reference Perlis, Miyahara, Marangell, Wisniewski, Ostacher and DelBello40]. This analysis cannot address the importance of the birth cohort effect in any one country. Only three variables were available in this database to evaluate the clinical usefulness of the clustering results. This analysis used the MCLUST mixture algorithm, and clusters determined by other clustering techniques, or by cutoffs based on clinical observation, were not evaluated.
Researchers using mixture analysis have previously noted that a birth cohort effect may influence the composition of the subgroups, or the distribution of some clinical variables within the subgroups [2,9]. Regardless of the cause of the birth cohort effect, ignoring the cohort effect in a statistical analysis of age of onset may produce misleading results.
5. Conclusion
In conclusion, the results of this international study are consistent with prior findings that there are subgroups in the onset of bipolar I disorder [8,9], and that there is a birth cohort effect [14,21]. The birth cohort effect influenced the number and characteristics of the subgroups determined by clustering methodology. Further investigation is needed to determine if including the birth cohort in cluster analysis based on age of onset will identify subgroups that are more useful for clinical research.
Disclosure of interest
The authors declare that they have no conflicts of interest concerning this article.
Acknowledgements
We thank the European College of Neuropsychopharmacology (ECNP) Network Initiative for supporting the European Network of Bipolar Research Expert Centres (ENBREC). This work was also funded in part by the following: Canadian Institutes of Health Research (MA, Grant number 64410); the Research Council of Norway (OAA Grant numbers 213837; 223273; 217776); South-East Norway Health Authority (OAA, Grant number 2013-123); a NHMRC Senior Principal Research Fellowship (M Berk, Grant number 1059660; INSERM (BE, Grant number C0829) and APHP (BE, Grant number AOR11096); the Spanish Government (AGP, Grant numbers PS09/02002 CIBER Network; EC10-333, PI10/01430, PI10/01746, PI11/01977, PI11/02708, 2011/1064, 11-BI-01, 1677-DJ-030, EC10-220); European Regional Development Funds (Grant numbers UE/2012/FI-STAR, UE/2013/TENDERMH, UE/2013/MASTERMIND), grants from Spanish Government (Grant numbers PI10/01430, PI10/01746, EC10-220, EC10-333, PI11/01977, 20111064, PI11/02708, PI12/02077, PI13/02252, PI13/00451), local grants from the Basque Government (Grant numbers 200911147, 2010111170, 2010112009, 2011111110, 2011111113); the Basque Foundation for Health Innovation and Research (Grant number BIO12/AL/002); the Spanish Clinical Research Network (Grant numbers CAIBER;1392-D-079) and the University of the Basque Country (Grant number IT679-13); Stanley Research Foundation (Grant number 03-RC-003); the Research Council of Norway (IM, Grant numbers ES488722, ES421716); the Regional Health Authority of South Eastern Norway (IM, Grants number 2011085, 2013088); DFG (AR, Grant numbers SFB TRR 58, B06, Z02); the DFG and Länder funds (AR, Grant number RTG1252/2); Medical Research Council of South Africa (DJS); Spanish Ministry of Economy and Competitiveness (EV, Grants number PI12/00912, PN 2008-2011); the Instituto de Salud Carlos III- Subdirección General de Evaluación y Fomento de la Investigación (EV); Fondo Europeo de Desarrollo Regional Unión Europea. Una manera de hacer Europa (EV); CIBERSAM (EV); the Comissionat per a Universitats i Recerca del DIUE de la Generalitat de Catalunya to the Bipolar Disorders Group (EV, Grant number 2009 SGR 1022), and the Department of Science and Technology INSPIRE scheme, Government of India (BV).
MB, EA, RA, CB, FB, RB, RHB,TDB, LB, YB, EYWC, MDZ, SD, AF, MAF, KNF, JGF, TG, HH, SH, CH, AI, ETI, FK, SK, BK, RK, MK, BL, RL, CLJ, UL, GM, MM,WM,SM, RM, FGN, CO, YO, AP, DQ, RR, NR, PR, JKR, KS, AMS, ES, CS, SS, AHS, KS, HT, YT, CT, MJW, MZ and PCW have no specific funding to acknowledge.
Comments
No Comments have been published for this article.