Skip to main content Accessibility help
×
Home

A new, improved and generalizable approach for the analysis of biological data generated by -omic platforms

  • A. B. Pleasants (a1) (a2), G. C. Wake (a2) (a3), P. R. Shorten (a1) (a2), C. Z. W. Hassell-Sweatman (a4), C. A. McLean (a4), J. D. Holbrook (a5), P. D. Gluckman (a2) (a4) (a5) and A. M. Sheppard (a2) (a4)...

Abstract

The principles embodied by the Developmental Origins of Health and Disease (DOHaD) view of ‘life history’ trajectory are increasingly underpinned by biological data arising from molecular-based epigenomic and transcriptomic studies. Although a number of ‘omic’ platforms are now routinely and widely used in biology and medicine, data generation is frequently confounded by a frequency distribution in the measurement error (an inherent feature of the chemistry and physics of the measurement process), which adversely affect the accuracy of estimation and thus, the inference of relationships to other biological measures such as phenotype. Based on empirical derived data, we have previously derived a probability density function to capture such errors and thus improve the confidence of estimation and inference based on such data. Here we use published open source data sets to calculate parameter values relevant to the most widely used epigenomic and transcriptomic technologies Then by using our own data sets, we illustrate the benefits of this approach by specific application, to measurement of DNA methylation in this instance, in cases where levels of methylation at specific genomic sites represents either (1) a response variable or (2) an independent variable. Further, we extend this formulation to consideration of the ‘bivariate’ case, in which the co-dependency of methylation levels at two distinct genomic sites is tested for biological significance. These tools not only allow greater accuracy of measurement and improved confidence of functional inference, but in the case of epigenomic data at least, also reveal otherwise cryptic information.

Copyright

Corresponding author

*Address for Correspondence: Dr A. M. Sheppard, Liggins Institute, The University of Auckland, Private Bag 92019, Victoria Street West, Auckland 1142, New Zealand. (E-mail a.sheppard@auckland.ac.nz)

References

Hide All
1.Talens, RP, Boomsma, DI, Tobi, EW, et al. Variation, patterns, and temporal stability of DNA methylation: considerations for epigenetic epidemiology. FASEB J. 2010; 24(9), 31353144.
2.Laird, PW. Principles and challenges of genome-wide DNA methylation analysis. Nat Rev Gen. 2010; 11, 191203.
3.Gervin, K, Hammero, M, Akselsen, H, et al. Extensive variation and low heritability of DNA methylation identified in a twin study. Genome Res. 2011; 21, 18131821.
4.Ehrich, M, Nelson, MR, Stanssens, P, et al. Quantitative high-throughput analysis of DNA methylation patterns by base-specific cleavage and mass spectrometry. Proc Nat Acad Sci. 2005; 102, 1578515790.
5.Warnecke, PM, Stirzaker, C, Melki, JR, et al. Detection and measurement of PCR bias in quantitative methylation analysis of bisulphite-treated DNA. Nucleic Acids Res. 2007; 25, 44224426.
6.Warnecke, PM, Stirzaker, C, Song, J, et al. Identification and resolution of artifacts in bisulfite sequencing. Methods. 2002; 27, 101107.
7.Coolen, MW, Statham, AL, Gardiner-Garden, M, Clark, SJ. Genomic profiling of CpG methylation and allelic specificity using quantitative high-throughput mass spectrometry: critical evaluation and improvements. Nucleic Acids Res. 2007; 35, e119.
8.Gallant, AR, Tauchen, G. Semi-nonparametric estimation of conditionally constrained heterogeneous processes: asset pricing applications. Econometrica. 1989; 57, 10911120.
9.Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood, 2001. OUP: Oxford, 528pp.
10.Buckland, ST. Fitting density functions with polynomials. J App Stats. 1992; 41, 6367.
11.Hassell-Sweatman, CZW, Wake, GC, Pleasants, AB, McLean, CA, Sheppard, AM. Linear models with response functions based on the Laplace distribution: statistical formulae and their application to epigenomics. ISRN Prob and Stats. 2014; 2013, 122.
12.Freund, JE, Walpole, REMathematical Statistics, 3rd edn, 1992. Prentice Hall: New Jersey, 547pp.
13.Porter, PS, Rao, ST, Ku, J-Y, Poirot, RL, Dakins, M. Small sample properties of non-parametric bootstrap t confidence intervals. J Air Waste Manag Assoc. 1997; 47, 11971203.
14.Jondeau, E, Poon, S-H, Rockinger, M. Financial Modelling Under Non–Gaussian Distributions, 2000. Springer-Verlag: London, 539 pp.
15.Purdom, E, Holmes, SP. Error distribution for gene expression data. Stat Appl Genet Mol Biol 2005; 4, 133.
16.Seow, WJ, Pesatori, AC, Dimont, E, et al. Urinary benzene biomarkers and DNA methylation in Bulgarian petrochemical workers: study findings and comparison of linear and beta regression models. PLoS One. 2012; 7, e50471.
17.Carroll, RJ, Ruppert, D, Stefanski, LA, Crainiceanu, CM. Measurement Error in Nonlinear Models: A Modern Perspective. Monographs on Statistics and Applied Probability, 2nd edn. 2006. Chapman and Hall/CRC Press: Florida, 488pp.
18.Ferrari, S, Cribari-Neto, F. Beta regression for modelling rates and proportions. J App Stats. 2004; 31, 799815.
19.Hebestreit, K, Dugas, M, Klein, HU. Detection of significantly differentially methylated regions in targeted bisulfite sequencing data. Bioinformatics. 2013; 29, 16471653.
20.R Core Team. R: A Language and Environment for Statistical Computing, 2013. R Foundation for Statistical Computing: Vienna, Austria, http://www.R-project.org/.
21.Babu, K, Zhang, J, Moloney, S, et al. Epigenetic regulation of ABCG2 gene is associated with susceptibility to xenobiotic exposure. J Proteomics. 2012; 75, 34103418.
22.Godfrey, KM, Sheppard, A, Gluckman, P, et al. Epigenetic gene promoter methylation at birth is associated with child’s later adiposity. Diabetes. 2011; 60, 15281534.
23.Kolassa, JE. Series Approximation Methods in Statistics. Lecture Notes in Statistics, 2006. Springer Science: New York, 218pp.

Keywords

A new, improved and generalizable approach for the analysis of biological data generated by -omic platforms

  • A. B. Pleasants (a1) (a2), G. C. Wake (a2) (a3), P. R. Shorten (a1) (a2), C. Z. W. Hassell-Sweatman (a4), C. A. McLean (a4), J. D. Holbrook (a5), P. D. Gluckman (a2) (a4) (a5) and A. M. Sheppard (a2) (a4)...

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed