Neuroticism is included in nearly all trait theories of personality (see Digman, Reference Digman1990) and has been established as a universal (i.e., not bound to any particular culture) trait in cross-cultural studies (McCrae & Costa Jr, Reference McCrae and Costa1997; Schmitt et al., Reference Schmitt, Allik, McCrae and Benet-Martinez2007). While the term *trait* conveys the conceptualization of personality as stable, it is known that personality scores change over the lifespan (McCrae et al., Reference McCrae, Costa, de Lima, Simoes, Ostendorf, Angleitner and Piedmont1999; Soto et al., Reference Soto, John, Gosling and Potter2011) and between cohorts (Smits et al., Reference Smits, Dolan, Vorst, Wicherts and Timmerman2011). A meta-analysis of longitudinal studies revealed that the mean score of neuroticism decreases between age 12 and 40 years, and remains largely stable thereafter (Roberts et al., Reference Roberts, Walton and Viechtbauer2006). In addition, the stability of individual differences in neuroticism is characterized by an increase in rank order stability up until age 60, with a decrease in stability observed after age 60 (Roberts & DelVecchio, Reference Roberts and DelVecchio2000).

Neuroticism is associated with mood and anxiety disorders (Clark et al., Reference Clark, Watson and Mineka1994; Enns & Cox, Reference Enns and Cox1997; Kendler et al., Reference Kendler, Kessler, Neale, Heath and Eaves1993; Middeldorp et al., Reference Middeldorp, Cath, van den Berg, Beem, Van Dyck, Boomsma and Turhan2006; Roberts & Kendler, Reference Roberts and Kendler1999). This association is partly due to pleiotropic genetic influences (Jardine et al., Reference Jardine, Martin, Henderson and Rao1984; Kendler et al., Reference Kendler, Kessler, Neale, Heath and Eaves1993; Mackintosh et al., Reference Mackintosh, Gatz, Wetherell and Pedersen2006; Middeldorp et al., Reference Middeldorp, Cath, Van Dyck and Boomsma2005; Reference Middeldorp, De Moor, McGrath, Gordon, Blackwood, Costa and Boomsma2011; Roberts & Kendler, Reference Roberts and Kendler1999). Genetic pleiotropy also accounts for the association between neuroticism and borderline personality disorder (Distel et al., Reference Distel, Trull, Willemsen, Vink, Derom, Lynskey and Boomsma2009). Neuroticism also has a moderate genetic correlation with somatic and neurological disorders such as migraine (Ligthart & Boomsma, Reference Ligthart and Boomsma2012). The heritability of neuroticism has been estimated between 30% and 60%. There is little or no evidence for common environmental influences shared by family members, which is consistent with the lack of cultural transmission from parents to offspring (Lake et al., Reference Lake, Eaves, Maes, Heath and Martin2000). There is some evidence for non-additive genetic effects (Birley et al., Reference Birley, Gillespie, Heath, Sullivan, Boomsma and Martin2006; Floderus-Myrhed et al., Reference Floderus-Myrhed, Pedersen and Rasmuson1980; Keller et al., Reference Keller, Coventry, Heath and Martin2005; Lake et al., Reference Lake, Eaves, Maes, Heath and Martin2000; Loehlin et al., Reference Loehlin, McCrae, Costa and John1998; Rettew et al., Reference Rettew, Vink, Willemsen, Doyle, Hudziak and Boomsma2006; van den Berg et al., Reference van den Berg, de Moor, McGue, Pettersson, Terracciano, Verweij and Boomsma2014; Vukasović & Bratko, Reference Vukasović and Bratko2015). Eaves et al. (Reference Eaves, Heath, Neale, Hewitt and Martin1998) suggested that this is more likely to be attributable to epistatic interaction (i.e., interaction between alleles at different genetic loci) rather than to genetic dominance (interaction within a genetic locus).

Rettew et al. (Reference Rettew, Vink, Willemsen, Doyle, Hudziak and Boomsma2006) found little evidence for quantitative changes in genetic or environmental variance in neuroticism between ages 12 and 17. Wray et al. (Reference Wray, Birley, Sullivan, Visscher and Martin2007; see also Birley et al., Reference Birley, Gillespie, Heath, Sullivan, Boomsma and Martin2006) reported genetic correlations across 22 years between 0.82 and 0.95, and environmental correlations between 0.24 and 0.53. These twin studies focused on correlations in neuroticism scores between measurement occasions. However, at each measurement occasion, the participants varied appreciably in age, which introduces possible age-related heterogeneity in genetic and environmental effects. In contrast, Viken et al. (Reference Viken, Rose, Kaprio and Koskenvuo1994) collected two repeated measures of neuroticism, six years apart, in twin pairs aged between 18 and 59 years, but reordered the data as a function of chronological age at first measurement. Viken et al. (Reference Viken, Rose, Kaprio and Koskenvuo1994) reported high genetic correlations (between 0.8 and 1), and low to moderate environmental correlations (between 0.25 and 0.54) between chronological ages. Briley and Tucker-Drob (Reference Briley and Tucker-Drob2014) recently reported a meta-analysis of genetic studies of personality, with the age of twin and sibling pairs ranging from infancy to old age. Their study included neuroticism, well-being, and measures of psychopathology, such as aggression and inattention. They found that both the genetic and environmental stability increased with age. They reported a substantially higher environmental stability when they corrected for measurement error.

In the current twin study, we analyzed repeated measures of neuroticism assessed between ages 14 and 32 years. Our aim was to further elucidate the increase in the rank stability of neuroticism from adolescence to young adulthood, a period marked by a mean decrease in neuroticism. We worked with age bins comprising two years, which provided greater resolution of the changes in genetic and environmental stability than that provided in previous work (Viken et al., Reference Viken, Rose, Kaprio and Koskenvuo1994). In addition to estimating genetic and environmental correlations across age, we addressed the effect of measurement error on the estimation of the environmental correlations between repeated neuroticism measures. Briley and Tucker-Drob (Reference Briley and Tucker-Drob2014) corrected the (unshared) environmental variance for measurement error by quantifying the proportion of measurement error variance (1-α) using Cronbach's α. Here, we accounted for measurement error by fitting a simplex (or autoregressive) model to longitudinal twin data (Boomsma & Molenaar, Reference Boomsma and Molenaar1987). In this model, we distinguish between genetic and environment variance involved in the auto-regression, and occasion specific (transient) genetic and environmental variance. We assume that age specific unshared environmental variance is largely due to measurement error.

## Methods

### Subjects and Measures

We analyzed the data of monozygotic (MZ) and dizygotic (DZ) twins in the Netherlands Twin Register (NTR). NTR participants were initially approached via city councils and later by other means (Willemsen et al., Reference Willemsen, Vink, Abdellaoui, den Braber, van Beek, Draisma and Boomsma2013). Participants were invited, at multiple occasions, to fill out and return a set of surveys, which included a neuroticism questionnaire. Since our main interest is in the stability of neuroticism between adolescence and adulthood, we focused on data from twins aged from 14 to 32. Neuroticism was assessed in 1991, 1995, 1997, 2000, and 2002. We reordered the data from the different surveys into nine age bins, each spanning two years (i.e., 14–15, 16–17,. . ., 28–29, 30–31). The dataset comprised 15,275 observations on 6,943 twins, including 1,392 complete MZ twin pairs and 1,826 complete DZ twin pairs. Table 1 contains the number of observations in each age bin. The number of overlapping samples between adjacent age batches is substantial, but drops off with the distance between the age batches. In view of this missingness, we used full information maximum-likelihood (FIML) estimation, under the assumption that the data are missing (completely) at random. We fitted all models in Mplus 6.11 using FIML. We compared competing models on the basis of Akaike's information criterion (AIC), Bayes information criterion (BIC) and sample size adjusted Bayes information criterion (SA-BIC).

Top: below the diagonal: phenotypic correlations between age bins (in italics) and above the diagonal available sample size for each pair of observations. Middle: the matrix the MZ and DZ twin correlations per age bin. Bottom: ABV-N scale means and standard deviations (*SD*).

Neuroticism was assessed using the Amsterdamsche Biografische Vragenlijst (ABV; Wilde, Reference Wilde1970; i.e., the Amsterdam Biographic Questionnaire). The ABV neuroticism scale (ABV-N) was modeled on the Eysenck Personality Questionnaire (EPQ). The ABV-N is a 30-item instrument that contains questions like ‘Do you often worry about the past?’ The response options are ‘yes’, ‘no’ and ‘?’. Following the ABV test manual, item responses were weighted in calculating the neuroticism sum score (Wilde, Reference Wilde1970). Table 1 contains the ABV-N means and standard deviations by age and sex. Van den Berg et al. (Reference van den Berg, de Moor, McGue, Pettersson, Terracciano, Verweij and Boomsma2014) compared the ABV-N and other neuroticism scales and reported a large overlap in item content, and a high correlation (0.89) between the ABV-N and the IRT neuroticism scores.

### Statistical Model

At each age, we modeled the neuroticism score using the Simplex model. The neuroticism score at time t was modeled as follows (discarding subject subscripts):

In equation 1, β_{0t} is the occasions specific intercept, β_{st} is the occasion specific sex effect, and τ_{At}, τ_{Dt}, and τ_{Et} are (zero mean) occasion-specific additive genetic, dominance, and unshared environmental variables. We assume that τ_{Et} largely due to measurement error. The (zero mean) additive genetic, dominance, and unshared environmental variables A_{t}, D
_{t}
, and E_{t} are decomposed into a part due to transmission of effects from earlier ages and a part due to innovation:

In equation 2, β_{At} is the auto-regression coefficient, and ζ_{At} is the innovation, that is, the residual in the regression of A_{t} on A_{t-1} (the same interpretation applies to equations 3 and 4). Note that at t = 1, we set A_{1} = ζ_{A1}, D_{1} = ζ_{D1}, and E_{1} = ζ_{E1}. While we included sex-related mean differences at each occasion (β_{st}), we imposed a single (genetic) covariance model in the male and females twins, as large studies found no evidence for moderation of effects by gender (van den Berg et al., Reference van den Berg, de Moor, McGue, Pettersson, Terracciano, Verweij and Boomsma2014; Vukasović & Bratko, Reference Vukasović and Bratko2015).

The equations 1–4 give rise to the following 9×9 covariance structure matrices:

where I is the 9×9 identify matrix, B_{A} (9×9) contains the autoregressive coefficients β_{At} (*t* = 2,9), the diagonal covariance matrix (9×9) Ψ_{A} contains the variance(A_{1}) (t = 1), and the variances var(ζ_{At}) (t = 2,9), and the diagonal covariance matrix (9×9) Θ_{A} contains the variance of τ_{At} (t = 1,9). Equations 6 and 7 are defined analogously. Note that the first and the last variances (e.g., in the matrix Θ_{A}, var[τ_{
A1}] and var[τ_{A9}]) are not identified. Identification is achieved by fixing these variances to zero, or constraining them to be equal to the adjacent variances. Below we adopted the latter constraint in the specification of Θ_{A}, Θ_{D}, and Θ_{E}. In MZ and DZ twin samples, the expected 18×18 partitioned covariance matrices are

where the weights 1/2 and 1/4 follow from quantitative genetic theory given a set of explicit assumptions, including random mating, which has been found to be tenable with respect to neuroticism (Eaves et al., Reference Eaves, Heath, Neale, Hewitt and Martin1998), and the absence of interaction and covariance among the genetic and environmental variables.

We expressed the unshared environmental covariance matrix corrected for attenuation due to measurement error as follows:

This is based on the strong assumption that the variance in the matrix Θ_{E} is dominated by measurement error. Note that the uncorrected environmental correlation between ages t-1 and t is

and the corrected correlation is

## Results

The phenotypic correlations between age bins are between ~0.40 and ~0.70 (Table 1). Females scored significantly higher at all ages (Table 1), as often reported in previous studies (Roberts et al., Reference Roberts, Walton and Viechtbauer2006; Soto et al., Reference Soto, John, Gosling and Potter2011). We observed a significant decrease in mean values with age (χ^{2}(16) = 301.2, *p* < .001). The twin correlations ranged from ~0.61 to ~0.66 in the MZ pairs, and from ~0.29 to ~0.10 in the DZ pairs (Table 1). The twin correlations suggested large genetic effects and the absence of shared environmental effects. The twin correlations at ages 24–25, 28–39, and 30–31 suggested dominance, as twice the DZ twin correlation is appreciably lower than the MZ twin correlation. The full ADE simplex model failed to converge (presumably due to missingness in the data) and does not feature in Table 2. We fitted a model in which only the age-specific dominance variance var(τ_{Dt}) was included (i.e., without the auto-regression terms) and this model did not fit better than the AE simplex model (see Table 2). We also considered the ADE simplex without age specific D variance, but this model did not converge, and likewise does not feature in Table 2. We observed that the estimated variances var(τ_{At}) were very small and not significant at any age. In addition, we noted that the estimated variances var(τ_{Et}) were very similar in magnitude. Constraining var(τ_{At}) to zero and var(τ_{Et}) to be equal was judged to be acceptable on the basis of the AIC, BIC, and SA-BIC (Table 2).

Table 3 contains the variance estimates and the standardized variance components, including a breakdown into environmental and error variance (i.e., occasion specific environmental). Heritability estimates showed a decreasing trend starting at 0.57 (age 14–15) and ending at 0.47 (age 30–31). The decrease in heritability was due to an increase in the environmental variance (from about ~215 at age 14–16 to ~323 at age 30–32). The genetic variance in neuroticism remained very similar between ages 14 and 31 (from ~282 at age 14–15 to 285 at age 30–31; Table 3). The estimated genetic covariance matrix Σ_{A} revealed high genetic stability between the ages of 14 and 31. The genetic correlations between adjacent ages increased with age, as shown in Figure 1; for example, the correlation between consecutive ages ranged from 0.84 (age 14–15 to age 16–17) to 0.97 (age 28–29 to age 30–31). The genetic innovation variances, var(ζ_{At}), at ages 24–25, 28–29, and 30–31 were not greater than zero (*p* values: *p*
_{24–25} = .47, *p*
_{28–29} = .65 and *p*
_{30–31} = .26). Dropping the innovation variances did not lead to a deterioration in model fit, χ^{2}(3) = 0.62, *p* = .88. This suggests a nearly perfect genetic stability between ages 24 and 31.

The table includes the standardized genetic variance component (h^{2}) and the standardized environmental variance components, broken down into environmental (e^{2}) and error components (denoted error^{2}). Note that the error^{2} terms are equal to the standardized unshared environmental age specific components (raw variance terms in Θ_{E}).

As shown in Figure 2, the environmental covariance matrix Σ_{E} revealed relatively low environmental stability. The uncorrected environmental correlations between ages 14–15 and 16–17 and between ages 28–29 and 30–31 are 0.31 and 0.57, respectively. The genetic stability was appreciably greater than the environmental stability. Table 3 contains the proportion of genetic variance (h^{2}), and the proportions of environmental and error variance (Figure 3). Taking one minus the proportion of error as an index of reliability, the results in Table 3 implied a reliability between 0.75 and 0.81. Again, assuming the age-specific environmental effects were completely attributable to measurement error, we calculated the correlations corrected for this error variance. As expected, the environmental stability was appreciably higher: the correlation between ages 14–15 and 16–17 was 0.70, and between ages 28–29 and 30–31 was 0.93. The environmental stability increased with age because the environmental innovation variance (ζ_{Et}) decreased with age and was no longer significant at age 30–31 (*SD*[ζ_{E30 − 32}] = 5.205, t = 1.455, *p* = .15).

## Discussion

Using a cohort sequential twin design, we quantified the genetic and environmental stability in neuroticism. Restructuring the data from measurement occasion into age bins allowed us to estimate age-related changes in genetic and environmental stability of neuroticism in the critical period between ages 14 and 31 years, a period that is characterized by an increase in the phenotypic stability of neuroticism (Roberts & DelVecchio, Reference Roberts and DelVecchio2000; Soto et al., Reference Soto, John, Gosling and Potter2011). We found evidence for increasing genetic stability resulting in nearly perfect rank stability after age 24, that is, genetic correlations approaching unity. When we corrected for measurement error, we also observed substantial environmental stability. Correcting for measurement error provides an upper bound for the environmental stability, where in the literature often the lower bound (i.e., not corrected for measurement error) is reported. It is important to note that correcting for measurement error does not guarantee a high stability as, for example, low transmission and high innovation would also result in a low stability.

The substantial environmental stability is compatible with the observation that life events have a lasting effect on mean neuroticism scores (Jeronimus et al., Reference Jeronimus, Ormel, Aleman, Penninx and Riese2013). Other unique environmental factors that may have long-lasting effects are adverse economic circumstances, which have also been observed to have enduring effects on, for example, externalizing problems (Ramanathan et al., Reference Ramanathan, Balasubramanian and Krishnadas2013). This result is further consistent with a meta-analysis of a more broadly defined ‘personality’ phenotype (Briley & Tucker-Drob, Reference Briley and Tucker-Drob2014). While Briley and Tucker-Drob (Reference Briley and Tucker-Drob2014) used a different methodology to correct for measurement error, their conclusion is similar: correcting for measurement error results in substantial environmental stability and this stability increases with age. Not correcting for measurement error leads to a systematic under appreciation of the amount of environmental stability in neuroticism. This conclusion may generalize to other traits.

The high degree of genetic stability that we observed is in line with previous longitudinal studies of the genetic stability of neuroticism in adolescents and adults (Gillespie et al., Reference Gillespie, Evans, Wright and Martin2004a; Viken et al., Reference Viken, Rose, Kaprio and Koskenvuo1994; Wray et al., Reference Wray, Birley, Sullivan, Visscher and Martin2007). The genetic stability in neuroticism between the ages of 14 and 31 is also similar to the genetic stability reported for symptoms of anxiety and depression, two traits strongly correlated to neuroticism (Gillespie et al., Reference Gillespie, Kirk, Evans, Heath, Hickie and Martin2004b; Nivard et al., Reference Nivard, Dolan, Kendler, Kan, Willemsen, van Beijsterveldt and Boomsma2014). The implications of the high degree of genetic stability for genome-wide association studies (GWAS) were discussed by Wray et al. (Reference Wray, Birley, Sullivan, Visscher and Martin2007). Here, we add that repeated measures can be used to good effect in analyses of the association between neuroticism and polygenic scores or genetic variants such as SNPs or CNVs. Specifically, discarding repeated measures in the regression of neuroticism scores on the genotype scores or genetic variants is not advisable as this will lead to a lower statistical power compared to the regression of all repeated measures. This is comparable to discarding one of the MZ twins in a GWAS including MZ twin pairs (see Minică et al., Reference Minică, Boomsma, Vink and Dolan2014). Ideally, association between genetic variants and repeated measures of a trait would be carried out in the context of a structural equation model to account for all subtleties in the genetics of a trait across age or repeated measures. However, as computational efficiency is often an issue in genome-wide studies, and the genetic contributions to neuroticism are stable, one could also decide to aggregate repeated measures by taking the mean. Alternatively, the regression of the repeated measures on genotypes (CNVs or SNPs) and polygenic scores can be carried out using generalized estimating equations to account for the dependency among the measurement measures (e.g., see Minică et al., Reference Minică, Dolan, Kampert, Boomsma and Vink2015). The inclusion of repeated measures, where available, in GWAS of neuroticism is a possible way to boost power and extend upon current gene finding efforts (de Moor et al., Reference de Moor, van den Berg, Verweij, Krueger, Luciano, Vasquez and Boomsma2015).

Unlike other genetically informative extended pedigree studies (Eaves et al., Reference Eaves, Heath, Neale, Hewitt and Martin1998; Keller et al., Reference Keller, Coventry, Heath and Martin2005; Lake et al., Reference Lake, Eaves, Maes, Heath and Martin2000), we found no evidence for a significant contribution of non-additive genetic variation. Ordering the data into two-year age batches allowed us to estimate the changes in environmental and genetic stability with good temporal resolution. Creating smaller sub-samples per age category may have reduced the power to detect dominance variance and other non-additive effects. Future work, perhaps combining samples from multiple centers as made possible by the neuroticism score harmonization by van der Berg et al. (Reference van den Berg, de Moor, McGue, Pettersson, Terracciano, Verweij and Boomsma2014), may make it possible to concurrently estimate the effects of non-additive genetic variance and age-related effects on neuroticism. International collaboration on the genetic of neuroticism has been successful and yielded the first genome-wide significant finding for neuroticism (de Moor et al., Reference de Moor, van den Berg, Verweij, Krueger, Luciano, Vasquez and Boomsma2015). Similar collaborations between twin and family registries could prove useful in further elucidating the genetic and environmental process underlying the development of personality across the lifespan.

## Acknowledgments

Funding was provided by the Netherlands Scientific Organization (NWO) (912-100-20): ‘Genetic influences on stability and change in psychopathology from childhood to young adulthood’ and the Royal Academy of Sciences Academy Professor Award (PAH/6635).