DNA methylation is a fundamental epigenetic modification implicated in many cellular processes, such as development, chromatin structure, maintenance of genomic imprinting, and X chromosome inactivation in females (Chow et al., Reference Chow, Yen, Ziesche and Brown2005; Delaval & Feil, Reference Delaval and Feil2004; Robertson, Reference Robertson2005). DNA methylation status across specific loci in the genome also mediates the regulation of gene expression and maintains genome integrity (Weber & Schubeler, Reference Weber and Schubeler2007). DNA methylation patterns are remarkably stable within functionally important regions (Kaminsky et al., Reference Kaminsky, Tang, Wang, Ptak, Oh, Wong and Petronis2009; Ushijima et al., Reference Ushijima, Watanabe, Okochi, Kaneda, Sugimura and Miyamoto2003) but can be highly variable in other genomic regions (Fraga et al., Reference Fraga, Ballestar, Paz, Ropero, Setien, Ballestar and Esteller2005; Mill et al., Reference Mill, Dempster, Caspi, Williams, Moffitt and Craig2006; Petronis et al., Reference Petronis, Gottesman, Kan, Kennedy, Basile, Paterson and Popendikyte2003), even in genetically identical individuals (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011; Fraga et al., Reference Fraga, Ballestar, Paz, Ropero, Setien, Ballestar and Esteller2005; Heijmans et al., Reference Heijmans, Kremer, Tobi, Boomsma and Slagboom2007). This suggests a possible involvement of environmental factors in the modulation of these changes. Interestingly, there is now increasing evidence that heritable factors may also play a role in the determination of DNA methylation patterns, at least at the genome-wide level (Bjornsson et al., Reference Bjornsson, Sigurdsson, Fallin, Irizarry, Aspelund, Cui and Feinberg2008; Boks et al., Reference Boks, Derks, Weisenberger, Strengman, Janson, Sommer and Ophoff2009; Numata et al., Reference Numata, Ye, Hyde, Guitart-Navarro, Tao, Wininger and Lipska2012).
Imprinting control regions (ICRs) are differentially methylated regulatory elements located near clusters of imprinted genes that coordinate preferential gene transcription in a parent-of-origin fashion (Constancia et al., Reference Constancia, Kelsey and Reik2004; Li et al., Reference Li, Beard and Jaenisch1993). ICRs entail segments of DNA rich in CpG dinucleotides, with the cytosine nucleotides displaying differential methylation depending upon parent-of-origin alleles. Whether the methylated/unmethylated state leads to silencing or not depends upon the default imprint state of the region (i.e., methylated or unmethylated; Lee et al., Reference Lee, Inoue, Ono, Ogonuki, Kohda, Kaneko-Ishino and Ishino2002). Abnormal DNA methylation within ICRs has been shown to result in loss of appropriate expression of imprinted genes. It is also associated with a number of diseases, such as Beckwith–Wiedemann syndrome (Smilinich et al., Reference Smilinich, Day, Fitzpatrick, Caldwell, Lossie, Cooper and Higgins1999; Weksberg et al., Reference Weksberg, Nishikawa, Caluseriu, Fei, Shuman, Wei and Squire2001, Reference Weksberg, Shuman, Caluseriu, Smith, Fei, Nishikawa and Squire2002), Silver–Russell syndrome (Eggermann et al., Reference Eggermann, Eggermann and Schonherr2008; Gicquel et al., Reference Gicquel, Rossignol, Cabrol, Houang, Steunou, Barbu and Le Bouc2005), Angelman syndrome, and Prader–Willi syndrome (Zeschnigk et al., Reference Zeschnigk, Schmitz, Dittrich, Buiting, Horsthemke and Doerfler1997). Due to the functional importance of genetic imprinting, it has been proposed that DNA methylation within ICRs might be under particular genetic regulation (Heijmans et al., Reference Heijmans, Kremer, Tobi, Boomsma and Slagboom2007).
To address the impact of genetic sequence on DNA methylation variation, we recently performed a detailed study in a sample comprising whole-blood DNA from 128 pairs of monozygotic (MZ) and 128 pairs of dizygotic (DZ) human twins (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). Twin studies have been particularly valuable in genetics research, and now offer an opportunity for the study of epigenetic variation as a dynamic quantitative trait. MZ pairs share the entirety of their genetic sequence and/or DZ pairs share approximately half of their genetic alleles, while both MZ and DZ pairs are usually subject to similar household conditions. Thus, by studying epigenetic differences in twins, it is possible to infer the proportion of epigenetic variation attributable to either environmental or genetic factors. We recently interrogated DNA methylation status of CpGs located within four ICRs (H19-ICR, IGF2-DMR, KvDMR, and NESPAS-ICR) and the non imprinted gene RUNX1 (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). Details of interrogated domains and individual CpGs are given in Figure S1 of our previous study (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). A high degree of variability in individual CpG methylation levels, notably at the H19/IGF2 loci, was observed (see Figure 2 of Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). And, overall, a similar pattern of methylation variation between MZ and DZ twins was found, with imprinted regions displaying median methylation levels around the expected 50%, and only a few single nucleotide polymorphisms (SNPs) showing increased mean methylation (H19-ICR_CpG02, H19-ICR_CpG03, IGF2-DMR_CpG05, and NESPAS-ICR_CpG14). Part of this DNA methylation plasticity seems to be clearly attributable to environmental and stochastic factors. However, concordant gains or losses of methylation were more common in MZ than DZ twin pairs, suggesting that de novo and/or maintenance methylation is influenced by the surrounding DNA sequence (see Table 1 of Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). This is in agreement with a previous longitudinal study, which reported familial clustering of methylation variation over time (Bjornsson et al., Reference Bjornsson, Sigurdsson, Fallin, Irizarry, Aspelund, Cui and Feinberg2008). Importantly, we also observed significant intra- but not inter-domain covariation in methylation state across ICRs (see Figure 3 in Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011) and showed that the rs10732516 (A/G) polymorphism, as well as the co-inherited rs2839701 (G/C) polymorphism — both located within the IGF2/H19 locus — are strongly associated with increased hypermethylation of specific CpG sites in the maternal H19 allele, which was later confirmed by clonal bisulfite sequencing analysis (see Figure 7 in Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). Additionally, we also performed a DNA sequencing analysis of all interrogated regions to exclude the possibility of bias introduced by genetic variation or mutations within primer binding sequences. Here, we expand our previous analyses to conduct a genome-wide search for other common polymorphisms that may be associated with methylation status within the ICRs of interest.
aMethylation ratios range between 0 and 1.
The identification of genetic variants that are associated with gene-specific DNA methylation could open a new avenue to the understanding of DNA methylation regulation. Genome-wide association studies (GWAS) of DNA methylation have been recently published and a number of methylation quantitative trait loci (mQTL) have been identified (Bell et al., Reference Bell, Pai, Pickrell, Gaffney, Pique-Regi, Degner and Pritchard2011; Gibbs et al., Reference Gibbs, van der Brug, Hernandez, Traynor, Nalls, Lai and Singleton2010), demonstrating the usefulness of incorporating this approach into epigenetics research. However, by treating each individual CpG as an independent variable, such studies have ignored the underlying correlation that exists across neighboring methylation sites.
In order to maximize power to detect common SNPs associated with DNA methylation within the ICRs of interest, we have used the MQFAM algorithm implementation for PLINK (Ferreira & Purcell, Reference Ferreira and Purcell2009), a multivariate GWAS approach that employs canonical correlation analysis to account for cross-trait covariance (Ferreira & Purcell, Reference Ferreira and Purcell2009). Multiple-trait GWAS improves power to detect association signals with pleiotropic effects over sets of correlated traits, without an increase in the false discovery rate (Bolormaa et al., Reference Bolormaa, Pryce, Hayes and Goddard2010), and has been applied to combinations of phenotypes such as blood lipid levels and gene expression (Inouye et al., Reference Inouye, Silander, Hamalainen, Salomaa, Harald, Jousilahti and Peltonen2010) or obesity, and osteoporosis measures (Liu et al., Reference Liu, Pei, Liu, Yang, Guo, Zhang and Deng2009).
Furthermore, as our previous analysis (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011) highlighted that SNP rs10732516 appears to have a substantial impact on the DNA methylation status of the H19-ICR region only when inherited on the maternal allele, we also conducted genome-wide family-based association using the quantitative transmission disequilibrium test (QTDT) package to test for quantitative association of both paternally and maternally inherited SNP alleles with the DNA methylation status at each methylation site (Spielman & Ewens, Reference Spielman and Ewens1996).
Materials and Methods
The twin samples analyzed are part of a study on moliness and cognitive function (Bataille et al., Reference Bataille, Snieder, MacGregor, Sasieni and Spector2000; Wright et al., Reference Wright, De Geus, Ando, Luciano, Posthuma, Ono and Boomsma2001) and comprise 512 adolescent twins (70 female/female and 58 male/male MZ twin pairs; 25 female/female, 29 male/male, and 74 opposite sex DZ twin pairs), with a mean age of 14.15 years (SD = 2.46; range 12–22.85). The samples are predominantly (>95%) of northern European origin (mainly Anglo-Celtic). Zygosity of the twins was confirmed using microsatellite repeat marker testing. Written informed consent to participate in this study was given by all participants and by their parents, legal guardians, or caretakers when participants were under-aged. The study protocol was approved by the Queensland Institute of Medical Research Human Research Ethics Committee (Ethics approvals P193 and P455). Additionally, as part of the study, parents were asked to complete a series of questionnaires and to donate blood samples.
DNA was extracted from blood samples and SNP genotyping performed with Illumina HumanHap 610W Quad arrays (http://support.illumina.com/array/array_kits/human610-quad_beadchip_kit/documentation.ilmn) by deCODE Genetics (Reykjavík, Finland). Genotype data were screened through a series of quality control criteria including Mendelian errors, minor allele frequency (MAF) ≥ 1%, p value of a Hardy–Weinberg equilibrium (HWE) test ≥ 10−6, SNP call rate > 95%, and Illumina Beadstudio GenCall score ≥ 0.7. We also screened for ancestral outliers by using principal component analysis (PCA; Price et al., Reference Price, Patterson, Plenge, Weinblatt, Shadick and Reich2006), comparing the genotyped data in the discovery sample with 16 global populations sourced from HapMap Phase 3 and northern European subpopulations from a previous study (McEvoy et al., Reference McEvoy, Montgomery, McRae, Ripatti, Perola, Spector and Visscher2009).
DNA Methylation Assay Design
MassCLEAVE™ assays against the genomic regions of interest were designed and tested using the AmpliconReport R-script that we described previously (Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007). In brief, the method consists of a Polymerase chain reaction (PCR) amplification of bisulfite-converted DNA tagged with a T7-promoter, which is followed by the generation of a single-stranded RNA molecule and base-specific cleavage (3′ to either rUTP or rCTP) with RNase A. The resulting mixture of cleavage products with different length and mass is analyzed with MALDI-TOF mass spectrometry. Differences in original DNA methylation state are reflected in changes in nucleotide sequence after bisulfite treatment, and therefore will produce different fragment masses in the assay. The abundance of each fragment (signal/noise level in the spectrum) acts as an indicator of the amount of DNA methylation in the interrogated sequence (Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007). The analyzed regions comprised imprinted regions H19-ICR (Takai et al., Reference Takai, Gonzales, Tsai, Thayer and Jones2001), IGF2-DMR (Cui et al., Reference Cui, Cruz-Correa, Giardiello, Hutcheon, Kafonek, Brandenburg and Feinberg2003), KvDMR (Nakano et al., Reference Nakano, Murakami, Meguro, Soejima, Higashimoto, Urano and Oshimura2006; Smilinich et al., Reference Smilinich, Day, Fitzpatrick, Caldwell, Lossie, Cooper and Higgins1999), and NESPAS-ICR (Liu et al., Reference Liu, Chen, Deng, Bourc'his, Nealon, Erlichman and Weinstein2005), and the promoter of the non-imprinted RUNX1 gene (see Figure S1 in Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). In this study, we focus our analyses exclusively on the ICRs.
Genomic Bisulfite Treatment
DNA methylation measurements were performed on genomic DNA extracted from whole blood. Bisulfite treatment was carried out using the EZ-96 DNA Methylation-Gold Kit (Zymo Research: Cat No. D5008) according to the manufacturer's instructions. In brief, 500 ng DNA was used in the bisulfite reaction and incubated for 8 hours at 55 °C. After desulfonation and clean up, the bisulfite treated DNA was resuspended in 50 μL of which 2 μL was used in each PCR.
PCR-Tagging, In Vitro Transcription and Mass Spectrometry Analysis
The target regions were amplified in triplicate using the primer pairs and annealing temperatures (Ta) described in Table S1 of Coolen et al. (Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). The MassCleave methylation analysis was performed as described previously (Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007). In brief, the PCR reactions were carried out in a total volume of 5 μL using 200 nM forward and reverse primer, 200 μM Deoxyribonucleotide triphosphate (dNTP), 1.5 mM MgCl2 1:100,000 dilution of SYBR Green (Invitrogen) and 0.35 U Platinum Taq DNA polymerase (Invitrogen) in 1 × PCR buffer without magnesium. PCR success was determined via melt curve analysis (ABI PRISM® 7700). Unincorporated dNTPs were dephosphorylated by incubation at 37 °C for 20 minutes in the presence of 1.7 μL H2O and 0.3 U Shrimp Alkaline Phosphatase (SAP), followed by a heat-inactivation for 5 minutes at 85 °C. The triplicate PCR samples after SAP treatment were pooled and of this pool, 2 μL were used in a 7 μL transcription reaction, containing 3.14 mM DTT, 2.5 mM dCTP, 1 mM rUTP, 1 mM rGTP, 1 mM rATP, 20 U T7 R&DNA polymerase, and 0.09 mg/mL RNase A in 0.64 × T7 polymerase buffer (all reagents from SEQUENOM, San Diego). Transcription and digestion were performed in the same step at 37 °C for 3 hours. After the addition of 20 μL H2O, conditioning of the phosphate backbone prior to MALDI-TOF MS was achieved by the addition of 6 mg CLEAN Resin (SEQUENOM, San Diego). Twenty-two nanoliters of the cleavage reactions were robotically dispensed onto silicon chips preloaded with matrix (SpectroCHIPs; SEQUENOM, San Diego). Mass spectra were collected using a MassARRAY mass spectrometer (SEQUENOM).
Calculation of Methylation Ratios
We used a sensitive and high-throughput method for DNA methylation analysis that is quantitative to 5% methylation for each informative CpG residue (Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007). Calculation of the DNA methylation ratios was performed using the R-script Analyze Sequenom Function (ASF; Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007), which is an adaptation of the formula used by MassCLEAVE™ software. A full description of ASF has been published previously (Coolen et al., Reference Coolen, Statham, Gardiner-Garden and Clark2007). All statistical calculations were carried out using either Stata 9 (StataCorp LP, Texas, USA) or the free ‘R’ software package for statistical computing (http://www.R-project.org).
Four parallel multiple-trait association were conducted with the MQFAM (Ferreira & Purcell, Reference Ferreira and Purcell2009) algorithm as implemented in PLINK (Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender and Sham2007). Each study comprised CpGs contained within each ICR domain. H19-ICR (12 variables), IGF2-DMR (seven variables), KvDMR (11 variables), and NESPAS-ICR (11 variables). Descriptive statistics for these variables are presented in Table 1. MQFAM uses canonical correlation analysis (CCA) to measure the association between two sets of variables. To explain how CCA works, it is possible to make the following analogy with PCA: PCA is usually applied to one set (X) of possibly correlated traits to extract a number of independent variates (components) that explain as much variance in the original set, whereas CCA is applied to two sets of variables (X and Y) to extract a number of independent pairs of variates (Ui, Vi) that explain as much covariance between the two original sets. More details on the method can be found at the MQFAM support site (http://genepi.qimr.edu.au/staff/manuelF/multivariate/main.html). Data were adjusted to correct for any sex, age, age2, sex*age, and sex*age2 effects. Although PLINK does not account for family structure, MQFAM offers the possibility of running permutation testing to correct for relatedness. However, given that this analysis is slow and computationally intensive when many traits are analyzed at the same time, we performed a two-stage analysis in order to reduce computation time. In the first stage, association testing was performed on a filtered dataset which included only one individual per family (N = 256), hence making permutation correction unnecessary, and allowing us to identify a subset of potentially associated ‘candidate’ SNPs (cut-off was set at p value < .0001). Then, in the second stage of the analysis, we tested this subset of ‘candidate’ SNPs for association in the sample consisting of all DZ twins (N = 256 individuals from 128 pairs) and one MZ from each pair (N = 128 individuals), resulting in a total sample of 384 individuals. We accounted for relatedness using 1 million permutations at each locus.
DNA methylation status was tested for standard single-trait association at the individual methylation unit level using MERLIN (Abecasis et al., Reference Abecasis, Cherny, Cookson and Cardon2002). Unlike PLINK, MERLIN corrects for relatedness by incorporating pedigree data. We also controlled for any age, sex, age2, sex*age, and sex*age2 effects. The total sample size for single-trait association was N = 512.
Family-Based Association Testing (Parent-of- Origin Ass)
We conducted family-based genome-wide testing for quantitative total association of both paternally and maternally inherited SNP alleles with DNA methylation levels at individual methylation sites across four ICRs using QTDT (Spielman & Ewens, Reference Spielman and Ewens1996). QTDT is a software package for family-based linkage disequilibrium analyses of quantitative and discrete traits that can use all the information in a pedigree to construct powerful tests of association (Spielman & Ewens, Reference Spielman and Ewens1996). The total sample size (N = 1,024) consisted of 512 twins (128 MZ and 128 DZ pairs) and their parents (N = 512).
Multivariate Intra-Domain GWAS
We performed multivariate GWAS by testing 515,966 genotyped SNPs for association with intra-domain DNA methylation variation at four ICRs (H19-ICR, IGF2-DMR, KvDMR, and NESPAS-ICR) using the MQFAM (Ferreira & Purcell, Reference Ferreira and Purcell2009) algorithm implementation for PLINK (Purcell et al., Reference Purcell, Neale, Todd-Brown, Thomas, Ferreira, Bender and Sham2007). MQFAM performs CCA to measure the association between two sets of variables by extracting the linear combination of traits that explain the largest possible amount of the covariation between the marker (SNP) and all traits. As detailed elsewhere (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011), the number of analyzed DNA methylation sites by region was 14 (H19-ICR), 8 (IGF2-DMR), 15 (KvDMR), and 17 (NESPAS-ICR). Data were adjusted for any age or sex effects (and their interactions) prior to association analysis. Due to method restrictions (PLINK is unable to account for relatedness of individuals in the sample), we structured our analysis in two stages. First, we tested for association in a sample subset containing only one individual per family selected at random (N = 256) and identified possible ‘candidate’ SNPs (those with p value < .0001). The candidate SNPs were then tested for association in a larger dataset consisting of 384 individuals (all DZ pairs, N = 256; and one MZ twin per family, N = 128; this is because although MQFAM permutation can account for DZ zygosity, it cannot adjust for genetically identical individuals).
After permutation correction, multivariate GWAS results (shown in Figure 1) revealed no SNPs with p values surpassing the genome-wide significance threshold. Table 1 shows a list of the most significantly associated SNPs with methylation within each ICR; rs4930103, the most significantly associated SNP with methylation at the H19-ICR, is in Linkage disequilibrium (LD) with SNPs rs2839701 and rs10732516 (r 2 = 0.854), and is part of the haplotype that was previously described (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). A locus zoom view of this region is provided in Figure 2.
We tested the association of rs4930103 with individual CpG sites within H19-ICR by conducting standard single-trait association testing on the entire dataset (N = 512) using MERLIN (Abecasis et al., Reference Abecasis, Cherny, Cookson and Cardon2002). Our analyses revealed that association was at least nominally significant (p ≤ .05) in five out of the twelve methylation sites surveyed, and that the three CpG sites with the most significant p values (H19-ICR_CpG02, p = 3.582e-15; H19-ICR_CpG03, p = 1.118e-05; and H19-ICR_CpG04, p = 2.88e-04; notation as in Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011) cluster within a 50 bp region. This indicates that the observed association signal between rs4930103 and methylation levels within the H19-ICR domain does not predispose methylation at all CpG sites of the domain, but it may be restricted instead to specific loci.
Multivariate GWAS at NESPAS-ICR identified rs965808 as the most significantly associated SNP. rs965808 is located within an intronic region of the GNAS and GNAS-AS1 imprinted genes, which are regulated by NESPAS-ICR, clearly suggesting a possible regulatory effect of the association signal observed between rs965808 and methylation of the NESPAS domain (a locus zoom view of this region is also shown in Figure 2).
Other SNPs identified through multivariate GWAS include rs11897432, rs2412488, and rs2555155, associated with methylation at H19-ICR; rs10462794, rs11227306, rs9596905, rs4304977, rs1007190, and rs1004689, associated with methylation at IGF2-DMR; rs7644516, rs11933531, rs7027203, and rs3858526, associated with KvDMR methylation; and rs17261688, rs3763558, rs724210, rs1022588, and rs965808, associated with methylation within NESPAS-ICR. Table 2 shows most significantly associated SNPs, their location, closest genes, F test statistic (an indicator of accumulated evidence of association), loadings per trait (an indicator of each trait contribution to total association), and p values before and after permutation correction. While some SNPs map within or near coding genes, it is not obvious the role of such genes in modulating methylation status, as previous reports in the literature are limited. Hence, the identification of novel genetic variants opens up new possibilities for the understanding of both pre-programmed and dynamic epigenetic mechanisms.
SNPs located in neighbouring regions to relevant DMRs are highlighted in bold.
aCoordinates according to GRCh37.p5; bp value before correcting for relatedness; cp value corrected for relatedness; dLoadings reflect contribution of individual traits (independent CpGs within the same DMR) to multivariate association with the given SNP.
To assess homogeneity of association signals across each ICR, we performed single-trait association using MERLIN for each suggestively associated SNP. rs2555155 and rs2412488, which were associated with methylation at H19-ICR were at least nominally significant in all and all but two sites, respectively. rs2555155 is located in an intronic region of genes DNHD1 (Dynein heavy chain domain containing protein 1) and FXC1 (Mitochondrial import inner membrane translocase subunit Tim9 B) in chromosome 11. On the other hand, rs2412488 is located within an intronic region of overlapping genes FIP1L1 (Pre-mRNA 3′-end-processing factor FIP1) and LNX1 (E3 ubiquitin-protein ligase) in chromosome 4. A Locus zoom view of these two regions is shown in Figure 3.
Parent-of-Origin Association With DNA Methylation
The second part of the analysis consisted of genome-wide screenings for evidence of parent-of-origin association between SNPs and DNA methylation at individual methylation sites. We used the genotypes and phenotypes from the whole twin sample (N = 512), as well as the genotypes of their parents (N = 512), to perform a family-based analysis with QTDT (Spielman & Ewens, Reference Spielman and Ewens1996) to test for differences in association dependent upon paternal or maternal allelic origin. This was to expand on our previous finding where rs10732516 and co-inherited rs2839701 show a strong association over the DNA methylation status of specific CpG sites within the H19-ICR, but only when inherited from the maternal allele. A summary of most significantly associated SNPs is shown in Table 3.
SNPs located in neighboring regions to relevant DMRs are highlighted in bold.
aPaternal effect; bMaternal effect; cp value (correcting for relatedness).
As expected, genome-wide parent-of-origin association analysis also detected the previously described association of rs4930103 within the H19-ICR region haplotype that contains binding sites for the insulator protein CTCF (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). H19-ICR-CpG02 showed the highest parent-of-origin association with rs4930103 (p = 5.00e-16). The A allele displayed an association effect (β) of -0.129 standard deviations (SD), when inherited from the paternal side, compared with 1.189 SD when inherited from the maternal side This is equivalent to -0.01 and 0.12 (in a scale of 0 to 1.0) in methylation ratio units, respectively. Likewise, SNP rs2285935 (in LD with rs4930103, r 2 = 0.854) also reached genome-wide significance association with H19-ICR-CpG04 (p = 2.00e-11), whereas rs2067051 was the most significant SNP associated with methylation at H19-ICR_CpG06 and H19-ICR_CpG22 (1.00e-08).
SNP rs10012307, located in an intergenic region of chromosome 4 near TERF1P3 (telomeric repeat binding factor 1 pseudogene 3), was associated with methylation at H19-ICR_CpG_09-10. Interestingly, varying degrees of significance were observed for SNPs in this region, going from no association to suggestive association to genome-wide significant. Figure 4 shows variation in parent-of-origin association results observed across different CpGs of the H19-ICR highlighting regions around rs10012307 and rs4930103.
The most significantly SNP associated with methylation at IGF2-DMR_CpG01 identified through parent-of-origin association was rs4819833 (6.00e-08), which mapped on chromosome 22 neighboring the SEPT5, GP1BB, and AC000093.3.1 genes. Furthermore, SNP rs7931462, located within RPL26P31 (ribosomal protein L26 pseudogene 31) gene in chromosome 11 showed association with DNA methylation at KvDMR_10-11-12 (p = 2.00E-09). Currently, the function of these genes is unknown.
Finally, methylation ratios at KvDMR_21, NESPAS_CpG08_09, and NESPAS_CpG02 showed suggestive associations with rs10497324, rs10934011, and rs13135284, which locate in intergenic regions of chromosomes 2, 3, and 4, respectively (p = 2.00e-09, 1.00e-09, and 5.00e-08).
Unlike the genome, which is determined at the time of conception and remains nearly unchanged across all cells of the body during the life span of an individual, the epigenome is greatly variable from tissue to tissue (Feinberg, Reference Feinberg2007; Mill et al., Reference Mill, Tang, Kaminsky, Khare, Yazdanpanah, Bouchard and Petronis2008; Petronis et al., Reference Petronis, Gottesman, Kan, Kennedy, Basile, Paterson and Popendikyte2003). Dynamic changes can result either from carefully directed epigenetic reprogramming processes during development or from environmental exposure, aging or stochastic factors (Horsthemke, Reference Horsthemke2006; Schanen, Reference Schanen2006; Schneider et al., Reference Schneider, Pliushch, Hajj, Galetzka, Puhl, Schorsch and Haaf2010; Siegmund et al., Reference Siegmund, Connor, Campan, Long, Weisenberger, Biniszkiewicz and Akbarian2007; Waterland & Jirtle, Reference Waterland and Jirtle2003).
While the dynamic nature of the epigenome is widely acknowledged, little is known about what drives this variation. Some studies have highlighted the importance of the environment while others have focused on the effects of genetic variation. In the present study, we applied two different types of genome-wide scans to look for genetic variants that contribute either to individual variation in the DNA methylation status within four imprinting domains or to single CpG methylation variation in a parent-of-origin fashion.
The most significantly associated SNPs found through multivariate GWAS lie in the neighboring regions of the analyzed imprinting regions: rs2067051 with H19-ICR and rs965808 with NESPAS-ICR, which provides support for our approach. The H19 region contains binding sites for CTCF, a critical zinc-finger CCCTC-binding factor, and we speculate that this polymorphism may directly affect the binding affinity of the insulator protein CTCF to this region (Coolen et al., Reference Coolen, Statham, Qu, Campbell, Henders, Montgomery and Clark2011). Notably, we were also able to detect SNPs located further from the interrogated ICRs that could be associated with methylation variation across CpGs of certain ICRs. It is worth noting that the significance of most significantly associated SNPs was of comparable level, regardless of their location (i.e., neighboring the interrogated regions or on different chromosomes). However, the role of SNPs located on different chromosomes is not evident, given that most of such variants lie within intronic regions or functionally uncharacterized intergenic regions.
SNP rs2067051 is of particular interest given that two prior independent studies (Adkins et al., Reference Adkins, Somes, Morrison, Hill, Watson, Magann and Krushkal2010b; Petry et al., Reference Petry, Seear, Wingate, Acerini, Ong, Hughes and Dunger2011) found association between the haplotypes that contain rs2067051 and birth weight. A third study that looked specifically at the association of birth weight with polymorphisms within and around the IGF2, H19, and IGF2R genes found that rs2067051, together with rs2251375 and rs4929984, is strongly associated with birth weight. Subsequent analysis also determined that association of the maternal genotype with newborn birth weight was due to parent-of-origin effects and not to direct maternal or direct newborn effects (Adkins et al., Reference Adkins, Somes, Morrison, Hill, Watson, Magann and Krushkal2010b). IGF2 is believed to be a major fetal growth factor, and H19-ICR plays a crucial role in regulating transcript levels of both H19 and IGF2. Hence, altogether, these studies support the link between parent-of-origin DNA methylation at H19-ICR and birth weight. Interestingly, maternally transmitted alleles within the GNAS gene, which is regulated by NESPAS-ICR, had also been associated previously with birth weight in male newborns (Adkins et al., Reference Adkins, Krushkal, Magann, Klauser, Morrison, Ramsey and Somes2010a). This was later supported by a recent study which analyzed ~20,000 CpG sites and found that methylation of genes involved in metabolism and biosynthesis was associated with birth weight (Gordon et al., Reference Gordon, Joo, Powell, Ollikainen, Novakovic, Li and Saffery2012). However, such study did not include analysis of parent-of-origin effects and was performed in a significantly smaller dataset (N = 68; Gordon et al., Reference Gordon, Joo, Powell, Ollikainen, Novakovic, Li and Saffery2012).
Genetic imprinting generally indicates that a selective advantage exists for the strict maintenance of allelic silencing of a gene or a discrete set of genes. Results from recent studies suggest that only a fraction of imprinted genes have been documented. For instance, Gregg et al. (Reference Gregg, Zhang, Weissbourd, Luo, Schroth, Haig and Dulac2010b) described over 1,300 loci with parental bias in the expression of individual genes and of specific transcript isoforms in the mouse brain. The differences between different brain regions were just as striking. For example, they reported preferential maternal contribution to gene expression in the developing brain and a major paternal contribution in the adult brain. In another paper, Gregg et al. (Reference Gregg, Zhang, Butler, Haig and Dulac2010a) also described imprinting differences between males and females. Future studies could test whether parent-of-origin genetic effects on methylation also contribute to regulate individual variation in gene expression levels of these newly discovered imprinted genes.
We acknowledge that our study presents some limitations. First, the relatively moderate associations between SNPs and DNA methylation, together with the relatively small sample size of the study, provide a limited statistical power to detect associations beyond the genome-wide significance threshold. However, by incorporating multivariate and family-based models of association, we were able to make a better use of the evidence contained in both the genotype and phenotype data. Furthermore, the fact that we were able to detect associations between ICRs and SNPs located in neighboring regions, via a hypothesis-free approach such as GWAS, also reduces uncertainty about the validity of our results, and makes it plausible that associations found in other regions are also valid. Our analysis also revealed that heterogeneity of associations across CpGs within the same ICR is rather high. As shown by the loading factors observed in multivariate association analysis (Table 2), a given genetic polymorphism might affect DNA methylation at some but not all CpGs within a region (i.e., an ICR). Our approach was designed to only detect those SNPs that display either a consistent association across all or the majority of the surveyed methylation sites within a region, or large parent-of-origin-dependent variation. Thus, it was not able to detect SNPs that have a significant association on specific individual CpGs.
With respect to previous studies, our analysis is the first to apply both multiple-trait GWAS and genome-wide parent-of-origin association approaches. Despite the relatively small sample size, we identified SNPs with p values that display suggestive associations. These include two SNPs located near the surveyed regions: the previously reported rs2067051 (associated with methylation at H19-ICR) and the novel rs965808 (associated with methylation at NESPAS-ICR). Furthermore, a number of other potentially associated SNPs were identified, a number of which display parent-of-origin association. Overall, our study provides a proof of principle for the use of alternative GWAS approaches to epigenetics. However, due to the customized design of the study, no replication cohort could be found.
Future studies with larger sample sizes should also include comparisons between methylation status across different tissues, and family-based studies could provide exciting information about the mechanisms underlying the parent-of-origin associations with DNA methylation and imprinting. However, an immeasurable complexity involving genetic and environmental factors is anticipated.
We especially thank the twins and their families for their participation; Marlene Grace, Ann Eldridge, and Kerrie McAloney for sample collection and processing; Lisa Bowdler, Steven Crooks, and staff of the Molecular Epidemiology Laboratory at QIMR for DNA sample processing and preparation; Harry Beeby, David Smyth for IT support; and Drs Dale R. Nyholt and Scott Gordon for their substantial efforts involving the QC and preparation of the GWA datasets. The research was supported by the Australian Research Council (A7960034, A79906588, A79801419, DP0212016, DP0343921), with genotyping funded by the National Health and Medical Research Council (Medical Bioinformatics Genomics Proteomics Program, 389891). The Genetics Cluster Computer (http://www.geneticcluster.org), which is financially supported by the Netherlands Organization for Scientific Research (NWO 480-05-003), was used for statistical analyses. Miguel E. Rentería acknowledges the support received through an Endeavour IPRS and UQRS Award, as well as ANZ Trustees Scholarship in Medical Research.
MWC, MER, SEM, and SJC conceived and designed the experiments. MWC, ALS, WQ, SS, and MJC performed the experiments. MER, MWC, and ALS analyzed the data. AKH, GWM, and NGM contributed reagents, materials, and analysis tools. MER, SMC, SJC, NGM, and SEM wrote the paper.
The authors declare no competing interests.