Studies of twins have made substantial contributions to the understanding of the relative importance of genes and the environment in many human traits (Polderman et al., Reference Polderman, Benyamin, de Leeuw, Sullivan, van Bochoven, Visscher and Posthuma2015). MZ twins are particularly useful in the investigation of developmental plasticity, as the co-twins are essentially genetically identical, and are matched for age, sex, and cohort-specific effects (Bell & Spector, Reference Bell and Spector2011). The observation of epigenetic discordance within MZ twin pairs could support an epigenetic mechanism for the developmental origins of health and disease (DOHaD) paradigm where intrauterine exposures predispose an individual to disease later in life (Gordon et al., Reference Gordon, Joo, Powell, Ollikainen, Novakovic, Li and Saffery2012; Loke et al., Reference Loke, Novakovic, Ollikainen, Wallace, Umstad, Permezel and Craig2013).
MZ twins can be categorized by their type of placentation. Whereas MC pairs share the same placenta, individuals in a DC pair each have their own placenta. The type of placentation is determined by the timing of embryo splitting post conception. It is thought that twinning that occurs early (usually within the first four days), results in DC or separate placentas, similar to dizygotic twins. When twinning occurs at a later stage, the twin pair may be MC.
Given the important role in nutrient transfer and buffering against potentially harmful in utero exposures, it would seem a reasonable assumption that pairs of MZ twins that share a single placenta may be more similar epigenetically compared to those with separate placentas. However, Kaminsky et al. (Reference Kaminsky, Tang, Wang, Ptak, Oh, Wong and Petronis2009) found that the 10 pairs of DC twin pairs were more similar than 10 MC twin pairs for DNA methylation from buccal cells, with the mean of the per-probe intra-class correlations being close to zero for MC twin pairs. A different approach to measuring similarity of MZ twin pairs was taken in Gordon et al. (Reference Gordon, Joo, Powell, Ollikainen, Novakovic, Li and Saffery2012), who looked at distributions of intra-class correlations take over all probes within a MZ pair as well as Euclidean distances between members of an MZ pair for both cord blood mononuclear cells and human umbilical vascular endothelia cells. As in Kaminsky et al. (Reference Kaminsky, Tang, Wang, Ptak, Oh, Wong and Petronis2009), this study provides an indication that MC twin pairs were less similar than DC MZ twin pairs, but reducing the data to a single correlation measure per twin pair (eight MC, nine and five DC) means these differences were not statistically significant.
Using data from the Brisbane Systems Genetics Study (BSGS; McRae et al., Reference McRae, Powell, Henders, Bowdler, Hemani, Shah and Montgomery2014), we investigated the role of placentation on the similarity of DNA methylation levels of adolescent twin pairs. We show that MC MZ twin pairs are more epigenetically similar. Regions of the genome showing significant differences in within-pair similarity in DNA methylation between MC and DC MZ pairs are identified and demonstrated to be involved in regulation of transcription.
The BSGS is described in detail elsewhere (McRae et al., Reference McRae, Powell, Henders, Bowdler, Hemani, Shah and Montgomery2014; Powell et al., Reference Powell, Henders, McRae, Caracella, Smith, Wright and Montgomery2012). In brief, the BSGS has DNA methylation measured on 614 individuals from 117 families of European descent recruited as part of a study on adolescent twins. The mothers of these families were asked on multiple occasions whether their twins had ‘separate placentas’, ‘joined placentas’, or a ‘single placenta’. Taking only twin pairs that were consistently reported to have either a single or two placentas left 18 MC (10 male pairs and 8 female, average age 13.6 years, SE 1.9), and 16 DC twin pairs (8 male pairs, 8 female, average age 18.8 years, SE 2.1).
Extraction of DNA and bisulphite conversion is described in detail elsewhere (McRae et al., Reference McRae, Powell, Henders, Bowdler, Hemani, Shah and Montgomery2014). Bisulfite converted DNA samples were hybridized to Illumina HumanMethylation450 BeadChips using the Infinium HD Methylation protocol and Tecan robotics (Illumina, San Diego, CA, USA). The HM450 BeadChip-assessed methylation status was interrogated at 485,577 CpG sites across the genome. It provides coverage of 99% of RefSeq genes. Methylation scores for each CpG site are obtained as a ratio of the intensities of fluorescent signals and are represented as β values. Samples in the BSGS study were randomly placed with respect to the chip they were measured on and to the position on that chip in order to avoid any confounding with family. In particular, the design assured that no twin pair had both members on the same array. DNA methylation data are available at the Gene Expression Omnibus under accession code GSE56105.
Probes that have been annotated as binding to multiple chromosomes (Price et al., Reference Price, Cotton, Lam, Farré, Emberly, Brown, Robinson and Kobor2013) were removed from the analysis, as were non-CpG probes. In addition, all probes on the sex chromosomes were excluded as these are not comparable across the sexes due to high methylation in females on the X-chromosome from X inactivation and the lack of Y-chromosome in females. The probability of a probe within a sample either being called as missing or with a detection p value less than .001 was estimated from the average rate across all probes and samples. A threshold for probes showing significant deviation from random missingness (or excess poor binding) was determined by testing against a binomial distribution for the number of samples at the 0.05 significance level with a Bonferroni correction for the number of probes. Probes with more than 11 individuals with missing data or more than five individuals with detection p values >.001 were removed. After cleaning, data for a total of 417,069 probes remained for subsequent analysis.
Full details of normalization are described elsewhere (McRae et al., Reference McRae, Powell, Henders, Bowdler, Hemani, Shah and Montgomery2014). In brief, individual probes were normalized across all samples using a generalized linear model with a logistic link function using the entire BSGS dataset. Corrections were made for the effects of chip (which encompasses batch processing effects) position on the chip, sex, age, age2, and interactions between sex and age, and blood cell counts estimated from the methylation data (Houseman et al., Reference Houseman, Accomando, Koestler, Christensen, Marsit, Nelson and Kelsey2012).
Testing Differences in MZ Correlations
For each of the 417,069 DNA methylation probes, the MZ pair intra-class correlation was estimated for both MC and DC pairs. Then the test-statistic (z score) for the difference between two correlations was calculated as
Functional Categorization of Genes
The gene with its transcription start site nearest to the array probe was identified for all probes (Price et al., Reference Price, Cotton, Lam, Farré, Emberly, Brown, Robinson and Kobor2013). Due to the indication of an inflation of probes with low p values in the QQ-plot from the MWAS, we considered all probes with a p value less than 10−4 for the gene set analysis. The 333 unique genes from 351 probes were tested for over-representation of functional annotation using DAVID — The Database for Annotation, Visualization, and Integrated Discovery — v6.7 (Huang et al., Reference Huang, Sherman and Lempicki2009a, Reference Huang, Sherman and Lempicki2009b).
Figure 1 shows the distribution of intra-class correlations from the 18 MC and 16 DC twin pairs. For example, the MC twin pairs have significantly more probes with correlations greater than 0.5 (19.4% vs. 14.9%, z = 54, p value smaller than machine precision).
While the power in this study comes from the comparison of the distribution of intra-class correlations between MC and DC MZ twin pairs, we also performed a methylome-wide association study (MWAS), where each methylation probe across the genome is tested for having a difference in correlation between MC and DC MZ twin pairs. Despite its low power, the MWAS identified a number of significant genomic probes (Figure 2a). The eight probes passing a stringent Bonferroni significance threshold are summarized in Table 1. Two of these probes are located near the MHC region on chromosome 6, which also contains a large number of probes with suggestive (p < 10−5) significance (cg10507304, p = 1.0 × 10−7; ch18113826, p = 39 × 10−7; ch14047561, p = 1.2 × 10−6; ch11040238, p = 2.7 × 10−6; ch01792601, p = 9.4 × 10−6). The Q-Q plot (Figure 2b) shows a deviation from the expected line for large test statistics (lambda = 1.09).
* Position of probe CpG target from genome build hg19.
Given the deviation of the QQ plot from expected under the null, we investigated the functionality of the 351 probes with a p value less than 10−4 as this will capture additional differences between MC and DC twins at the cost of some false positives. A functional classification of the 333 unique genes with the closest transcription start site to the probes using DAVID found a number of general biological processes significantly over-represented with a Benjamini–Hochberg p value < .001 (Table 2). The significantly over-represented categories were primarily involved in the regulation of transcription, but also included categories related to neuronal development and cellular differentiation. The probes with p value less than 10−4 also show a significantly different (p = .01) distribution of HIL annotation (Price et al., Reference Price, Cotton, Lam, Farré, Emberly, Brown, Robinson and Kobor2013), with less probes in high-density CpG islands (24% vs. 32% expected) and more probes in intermediate density areas (30% vs. 24% expected). Genomic annotation (e.g., exonic, intronic, upstream, intergenic) of these CpG sites using ANNOVAR (Wang et al., Reference Wang, Li and Hakonarson2010) found no significant deviation for the 351 probes from the total set (p = .94).
This study has demonstrated that the blood of co-twins from MZ twin pairs sharing a common placenta are more epigenetically similar than those with separate placentas in blood samples collected when the twins were on average 14 years old. This effect was observed in adolescent twin pairs, indicating that epigenetic changes due to environmental exposures in utero remain throughout early life. This result is in contrast to an earlier study (Kaminsky et al., Reference Kaminsky, Tang, Wang, Ptak, Oh, Wong and Petronis2009), which found a higher correlation in DC twin pairs. However, they also found an average MC MZ correlation of close to zero, which is inconsistent with our understanding of the genetic control of DNA methylation (McRae et al., Reference McRae, Powell, Henders, Bowdler, Hemani, Shah and Montgomery2014), and indicative of issues due to potential batch effects in their study. Gordon et al. (Reference Gordon, Joo, Powell, Ollikainen, Novakovic, Li and Saffery2012) took a different approach to measuring similarity in MZ twin pairs, which generated a single correlation (or Euclidian distance) across all probes in a twin pair. This has the disadvantage that only a single measure is generated per twin pair and the power afforded by using ~400,000 comparisons across the genome is lost, which resulted in their study showing no significant differences between MC and DC twins.
An investigation of the potential function of those regions that are more correlated in MC than DC MZ twins identified a number of gene ontology categories that were significantly over-represented. These included low level biological processes involved in the regulation of transcription, neuronal development, and cellular differentiation. A reasonable extrapolation of this result is that MC MZ twins would be more phenotypically similar for complex traits and disease status than DC MZ twins. However, there is conflicting evidence for this in the literature, with MC MZ twins found to be more concordant than DC MZ twins for intelligence (Jacobs et al., Reference Jacobs, Van Gestel, Derom, Thiery, Vernon, Derom and Vlietinck2001), but more discordant for birth weight (Carroll et al., Reference Carroll, Tyfield, Reeve, Porter, Soothill and Kyle2005). Indeed, a review by Charlemain and Pons (Reference Charlemain and Pons1998) concludes that MC MZ are generally more similar that DC MZ for ‘behavioral’ phenotypes and less similar for ‘biological’ phenotypes. For birth weight it is assumed the increased discordance for MC MZ twins is due to greater competition for the shared blood supply in the placenta for most MC twins. This is a potential explanation for the increased discordance in other biological phenotypes, but it is unclear why it would not also apply to behavioral phenotypes as well.
The difference in similarity between MC and DC MZ twins has implications for the estimation of heritability using twin models. Dizygotic twins have separate placentas, and so have an intra-uterine environment equivalent to only DC MZ twins, so it is possible that the common environment assumption when estimating heritabilities would only hold when excluding MC MZ twins. This study was limited in only comparing 18 and 16 pairs of MC and DC MZ twins, and the deviation from the expected line on the Q-Q plot indicates that we would find many more significant regions with a larger cohort. In addition, using clinically derived chronicity would have the potential to increase the separation between MC and DZ twins, as many DC placentas are fused and essentially look like a single placenta. However, it is of note that the average correlation of DNA methylation levels between a range of relative pairs from the complete BSGS dataset — of which the individuals analyzed here are a subset — fitted remarkably well to a simple additive genetic model and any common environmental effect for twin pairs being small. Moreover, how this differential methylation translates into differential measured phenotypes is unknown, as is whether the same effect is present for other tissues.
Overall, this study supports previous findings implicating the prenatal environment in mediating specific epigenome signatures that remain through to at least adolescence (Richmond et al., Reference Richmond, Simpkin, Woodward, Gaunt, Lyttleton, McArdle and Relton2015). This supports the idea of the importance of a positive prenatal environment due to its potential role in preventing disease through epigenetic dysregulation (Perera & Herbstman, Reference Perera and Herbstman2011).
We gratefully acknowledge the participation of the twins and their families. We thank Marlene Grace, Ann Eldridge, and Kerrie McAloney for sample collection and processing; the staff of the Molecular Epidemiology Laboratory at QIMR for DNA sample processing and preparation, and Harry Beeby and David Smyth for IT support.
This research was supported by NHMRC grants 1010374, 496667 and 1046880, and the National Institutes of Health (NIH) grants GM057091 and GM099568. A. F. M., B. B., G. W. M. are supported by the NHMRC Fellowship Scheme. We acknowledge funding by the Australian Research Council (A7960034, A79906588, A79801419, DP0212016, DP0343921), and the Australian National Health and Medical Research Council (NHMRC) Medical Bioinformatics Genomics Proteomics Program (grant 389891) for building and maintaining the adolescent twin family resource through which samples were collected.