Skip to main content Accessibility help
×
Hostname: page-component-76fb5796d-vvkck Total loading time: 0 Render date: 2024-04-25T08:24:34.541Z Has data issue: false hasContentIssue false

14 - eQTL mapping

from Part III - Single nucleotide polymorphisms, copy number variants, haplotypes and eQTLs

Published online by Cambridge University Press:  18 December 2015

Mengjie Chen
Affiliation:
Yale University
Can Yang
Affiliation:
Hong Kong Baptist University
Cong Li
Affiliation:
Yale University
Hongyu Zhao
Affiliation:
Yale University
Krishnarao Appasani
Affiliation:
GeneExpression Systems, Inc., Massachusetts
Stephen W. Scherer
Affiliation:
University of Toronto
Peter M. Visscher
Affiliation:
University of Queensland
Get access

Summary

Introduction

With an influx of successful genome-wide association studies to identify genetic variations associated with complex diseases, an unprecedented wealth of knowledge has been accumulated for SNP–phenotype associations (McCarthy et al., 2008; Witte 2010; Manolio 2013). However, many SNP–disease associations do not lend themselves to molecular interpretations, because many of the identified loci are located outside of the coding regions. Even when a gene can be inferred to be causal, there is often a significant gap towards the understanding of the underlying molecular mechanisms (Schadt et al., 2005; McCarthy et al., 2008). Genome-wide eQTL mapping has been one effective approach to bridge this gap (Mackay et al., 2009). In eQTL studies, gene expression levels measured by high-throughput technologies, such as microarrays and RNA-Seq, are treated as quantitative traits. Marker genotypes are also collected from the same set of individuals, and statistical analyses are performed to detect associations between markers and expression traits. By simultaneously capturing many regulatory interactions, eQTLs offer valuable insights on the genetic architecture of expression regulation (Rockman and Kruglyak 2006). The ultimate goal of eQTL studies is to elucidate how genetic variations affect phenotypes by using gene expression levels as intermediate molecular phenotypes (Nica and Dermitzakis 2008). In this chapter, we provide an overview of the eQTL analysis workflow (Figure 14.1), introduce publicly available tools for analysis, and further discuss challenges and issues.

Data pre-processing

Genome-wide eQTL mapping considers high-density SNP genotype data and gene expression data from the same individuals in a segregating population. Both require appropriate pre-processing as described below for subsequent analysis.

Genotype data

Three quality control (QC) criteria are often used in the pre-processing of the genotype data. (1) Missing rate: individuals with a large proportion of missing SNP genotypes (e.g., 10%) should be excluded because the DNA samples of those individuals may be of poor quality. SNPs with a large missing rate (e.g., 5%) should also be filtered out. (2) Hardy–Weinberg Equilibrium (HWE): statistically significant deviations from HWE often result from genotyping errors. Therefore, SNPs that fail an exact HWE test (e.g., a P-value less than 0.001) should be filtered out. The criterion does not apply to haploid organisms, such as yeast. (3) Minor allele frequency (MAF): SNPs with low MAF (e.g., 0.05) are sometimes filtered out because of the insufficient statistical power for studies with a relatively small sample size and potentially higher genotype calling error.

Type
Chapter
Information
Genome-Wide Association Studies
From Polymorphism to Personalized Medicine
, pp. 208 - 228
Publisher: Cambridge University Press
Print publication year: 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ashburner, M., Ball, C.A., Blake, J.A., et al. (2000). Gene ontology: tool for the unification of biology. Nature Genet., 25, 25–29.CrossRefGoogle ScholarPubMed
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. Ser. B (Method.), 57, 289–300.Google Scholar
Bohnert, R. and Rätsch, G. (2010). rQuant.web: a tool for RNA-Seq-based transcript quantitation. Nucleic Acids Res., 38, W348–W351.CrossRefGoogle ScholarPubMed
Bolstad, B.M., Irizarry, R.A., Åstrand, M. and Speed, T.P. (2003). A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics, 19, 185–193.CrossRefGoogle ScholarPubMed
Brem, R.B., Yvert, G., Clinton, R. and Kruglyak, L. (2002). Genetic dissection of transcriptional regulation in budding yeast. Science, 296, 752–755.CrossRefGoogle ScholarPubMed
Broman, K.W., Wu, H., Sen, Ś. and Churchill, G.A. (2003). R/QTL: QTL mapping in experimental crosses. Bioinformatics, 19, 889–890.CrossRefGoogle ScholarPubMed
Browning, S.R. and Browning, B.L. (2007). Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet., 81, 1084–1097.CrossRefGoogle ScholarPubMed
Cai, T.T., Li, H., Liu, W. and Xie, J. (2013). Covariate-adjusted precision matrix estimation with an application in genetical genomics. Biometrika, 100, 139–156.CrossRefGoogle ScholarPubMed
Carey, V.J. (2013). GGtools: Genetics of Gene Expression with Bioconductor. R package version 4.6.2.
Chen, L.S., Sangurdekar, D.P. and Storey, J.D. (2011). trigger: Transcriptional Regulatory Inference from Genetics of Gene ExpRession. R package version 1.4.0.
Chen, M., Ren, Z., Zhao, H. and Zhou, H. (2015). Asymptotic normal estimation of covariate-adjusted gaussian graphical model. J. Am. Stat. Ass. Theory Meth. (in press).
Da Huang, W., Sherman, B.T. and Lempicki, R. A. (2008). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols, 4, 44–57.Google Scholar
Delaneau, O., Zagury, J.-F. and Marchini, J. (2012). Improved whole-chromosome phasing for disease and population genetic studies. Nature Meth., 10, 5–6.Google Scholar
Dillies, M.-A., Rau, A., Aubert, J., et al. (2013). A comprehensive evaluation of normalization methods for illumina high-throughput RNA sequencing data analysis. Brief. Bioinform., 14, 671–683.CrossRefGoogle ScholarPubMed
Dunning, M.J., Smith, M.L., Ritchie, M.E. and Tavaré, S. (2007). beadarray: R classes and methods for Illumina bead-based data. Bioinformatics, 23, 2183–2184.CrossRefGoogle ScholarPubMed
Eden, E., Navon, R., Steinfeld, I., Lipson, D. and Yakhini, Z. (2009). GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists. BMC Bioinform., 10, 48.CrossRefGoogle ScholarPubMed
Fusi, N., Stegle, O. and Lawrence, N.D. (2012). Joint modelling of confounding factors and prominent genetic regulators provides increased accuracy in genetical genomics studies. PLoS Comput. Biol., 8, e1002330.CrossRefGoogle ScholarPubMed
Gagnon-Bartsch, J.A. and Speed, T.P. (2012). Using control genes to correct for unwanted variation in microarray data. Biostatistics, 13, 539–552.CrossRefGoogle ScholarPubMed
Gautier, L., Cope, L., Bolstad, B.M. and Irizarry, R.A. (2004). affy – analysis of Afymetrix GeneChip data at the probe level. Bioinformatics, 20, 307–315.CrossRefGoogle Scholar
Guttman, M., Garber, M., Levin, J.Z., et al. (2010). Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincrnas. Nature Biotechnol., 28, 503–510.Google ScholarPubMed
Haley, C.S., Knott, S.A. and Elsen, J. (1994). Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics, 136, 1195–1207.Google ScholarPubMed
Hamel, L.-P., Nicole, M.-C., Duplessis, S. and Ellis, B.E. (2012). Mitogen-activated protein kinase signaling in plant-interacting fungi: distinct messages from conserved messengers. Plant Cell Online, 24, 1327–1351.CrossRefGoogle ScholarPubMed
Johnson, W.E., Li, C. and Rabinovic, A. (2007). Adjusting batch effects in microarray expression data using empirical bayes methods. Biostatistics, 8, 118–127.CrossRefGoogle ScholarPubMed
Kanehisa, M. and Goto, S. (2000). Kegg: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res., 28, 27–30.CrossRefGoogle ScholarPubMed
Kang, H.M., Ye, C. and Eskin, E. (2008). Accurate discovery of expression quantitative trait loci under confounding from spurious and genuine regulatory hotspots. Genetics, 180, 1909–1925.CrossRefGoogle ScholarPubMed
Katz, Y., Wang, E.T., Airoldi, E.M. and Burge, C.B. (2010). Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nature Meth., 7, 1009–1015.CrossRefGoogle ScholarPubMed
Kim, D., Pertea, G., Trapnell, C., et al. (2013). TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol., 14, R36.CrossRefGoogle ScholarPubMed
Kim, S. and Xing, E.P. (2012). Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann. Appl. Stat., 6, 1095–1117.CrossRefGoogle Scholar
Lander, E.S. and Botstein, D. (1989). Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics, 121, 185–199.Google ScholarPubMed
Lee, S., Zhu, J. and Xing, E.P. (2010). Adaptive multi-task lasso: with application to eQTL detection. In Advances in neural information processing systems, pp. 1306–1314.
Leek, J.T., Scharpf, R.B., Bravo, H.C., et al. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Rev. Genet., 11, 733–739.CrossRefGoogle ScholarPubMed
Li, B., Ruotti, V., Stewart, R.M., Thomson, J.A. and Dewey, C.N. (2010a). RNA-Seq gene expression estimation with read mapping uncertainty. Bioinformatics, 26, 493–500.CrossRefGoogle ScholarPubMed
Li, B., Chun, H. and Zhao, H. (2012b). Sparse estimation of conditional graphical models with application to gene networks. J. Am. Statist. Ass., 107, 152–167.CrossRefGoogle ScholarPubMed
Li, C. and Wong, W.H. (2001). Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application. Genome Biol., 2, 1–11.Google ScholarPubMed
Li, J.J., Jiang, C.-R., Brown, J.B., Huang, H. and Bickel, P.J. (2011). Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation. Proc. Natl Acad. Sci. USA, 108, 19867–19872.CrossRefGoogle ScholarPubMed
Li, L., Zhang, X. and Zhao, H. (2012b). eQTL. In Quantitative Trait Loci (QTL). Springer, pp. 265–279.Google Scholar
Li, Y., Álvarez, O.A., Gutteling, E.W., et al. (2006). Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet., 2, e222.CrossRefGoogle ScholarPubMed
Li, Y., Willer, C.J., Ding, J., Scheet, P. and Abecasis, G.R. (2010b). MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol., 34, 816–834.CrossRefGoogle ScholarPubMed
Listgarten, J., Kadie, C., Schadt, E.E. and Heckerman, D. (2010). Correction for hidden confounders in the genetic analysis of gene expression. Proc. Natl Acad. Sci. USA, 107, 16465–16470.CrossRefGoogle ScholarPubMed
Mackay, T.F., Stone, E.A. and Ayroles, J.F. (2009). The genetics of quantitative traits: challenges and prospects. Nature Rev. Genet., 10, 565–577.CrossRefGoogle ScholarPubMed
Manolio, T.A. (2013). Bringing genome-wide association findings into clinical use. Nature Rev. Genet., 14, 549–558.CrossRefGoogle ScholarPubMed
McCarthy, M.I., Abecasis, G.R., Cardon, L.R., et al. (2008). Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet., 9, 356–369.CrossRefGoogle ScholarPubMed
Michaelson, J.J., Loguercio, S. and Beyer, A. (2009). Detection and interpretation of expression quantitative trait loci (eQTL). Methods, 48, 265–276.CrossRefGoogle Scholar
Mortazavi, A., Williams, B.A., McCue, K., Schaeffer, L. and Wold, B. (2008). Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Meth., 5, 621–628.CrossRefGoogle ScholarPubMed
Nica, A.C. and Dermitzakis, E.T. (2008). Using gene expression to investigate the genetic basis of complex disorders. Hum. Molec. Genet., 17, R129–R134.CrossRefGoogle ScholarPubMed
Obozinski, G., Wainwright, M.J. and Jordan, M.I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist., 39, 1–47.CrossRefGoogle Scholar
Pastinen, T., Ge, B. and Hudson, T.J. (2006). Influence of human genome polymorphism on gene expression. Hum. Molec. Genet., 15, R9–R16.CrossRefGoogle ScholarPubMed
Price, A.L., Patterson, N.J., Plenge, R.M., et al. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet., 38, 904–909.CrossRefGoogle ScholarPubMed
Purcell, S., Neale, B., Todd-Brown, K., et al. (2007). Plink: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet., 81, 559–575.CrossRefGoogle ScholarPubMed
Richard, H., Schulz, M.H., Sultan, M., et al. (2010). Prediction of alternative isoforms from exon expression levels in RNA-Seq experiments. Nucleic Acids Res., 38, e112–e112.CrossRefGoogle ScholarPubMed
Robertson, G., Schein, J., Chiu, R., et al. (2010). De novo assembly and analysis of RNA-Seq data. Nature Meth., 7, 909–912.CrossRefGoogle ScholarPubMed
Rockman, M.V. and Kruglyak, L. (2006). Genetics of global gene expression. Nature Rev. Genet., 7, 862–872.CrossRefGoogle ScholarPubMed
Salzman, J., Jiang, H. and Wong, W.H. (2011). Statistical modeling of RNA-Seq data. Statist. Sci., 26, 62–83.CrossRefGoogle ScholarPubMed
Schadt, E.E., Lamb, J., Yang, X., et al. (2005). An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genet., 37, 710–717.CrossRefGoogle ScholarPubMed
Shabalin, A.A. (2012). Matrix eQTL: ultra fast eQTL analysis via large matrix operations. Bioinformatics, 28, 1353–1358.CrossRefGoogle ScholarPubMed
Simon, N., Friedman, J., Hastie, T. and Tibshirani, R. (2013). A sparse-group lasso. J. Comput. Graph. Statist., 22, 231–245.CrossRefGoogle Scholar
Smyth, G.K. (2005). Limma: linear models for microarray data. In Gentleman, R., Carey, V., Dudoit, S., Irizarry, R. and Huber, W. (Eds.), Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, New York, NY, pp. 397–420.Google Scholar
Stojmirović, A. and Yu, Y.-K. (2009). ITM probe: analyzing information flow in protein networks. Bioinformatics, 25, 2447–2449.CrossRefGoogle ScholarPubMed
Stojmirović, A. and Yu, Y.-K. (2012). Information flow in interaction networks II: channels, path lengths, and potentials. J. Comput. Biol., 19, 379–403.CrossRefGoogle ScholarPubMed
Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika, 99, 879–898.CrossRefGoogle Scholar
Sun, W. and Hu, Y. (2013). eQTL mapping using RNA-seq data. Statist. Biosci., 5, 198–219.CrossRefGoogle ScholarPubMed
Suthram, S., Beyer, A., Karp, R.M., Eldar, Y. and Ideker, T. (2008). eQED: an efficient method for interpreting eQTL associations using protein networks. Molec. Syst. Biol., 4, 162.CrossRefGoogle ScholarPubMed
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Statist. Soc. Ser. B (Method.), 58, 267–288.Google Scholar
Trapnell, C.,Williams, B.A., Pertea, G., et al. (2010). Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnol., 28, 511–515.CrossRefGoogle ScholarPubMed
Tu, Z., Wang, L., Arbeitman, M.N., Chen, T. and Sun, F. (2006). An integrative approach for causal gene identification and gene regulatory pathway inference. Bioinformatics, 22, e489–e496.CrossRefGoogle ScholarPubMed
Verbeke, L.P., Cloots, L., Demeester, P., Fostier, J. and Marchal, K. (2013). Epsilon: an eQTL prioritization framework using similarity measures derived from local networks. Bioinformatics, 29, 1308–1316.CrossRefGoogle ScholarPubMed
Voevodski, K., Teng, S.-H. and Xia, Y. (2009). Spectral affinity in protein networks. BMC Syst. Biol., 3, 112.CrossRefGoogle ScholarPubMed
Wang, X., Qin, L., Zhang, H., et al. (2015). A regularized multivariate regression approach for eQTL analysis. Statist. Biosci., 7, 129–146.CrossRefGoogle ScholarPubMed
Witte, J.S. (2010). Genome-wide association studies and beyond. Annu. Rev. Publ. Health, 31, 9–20.CrossRefGoogle ScholarPubMed
Xia, Z., Wen, J., Chang, C.-C. and Zhou, X. (2011). NSMAP: A method for spliced isoforms identification and quantification from RNA-Seq. BMC Bioinform., 12, 162.CrossRefGoogle ScholarPubMed
Yang, C., Wang, L., Zhang, S. and Zhao, H. (2013). Accounting for non-genetic factors by low-rank representation and sparse regression for eQTL mapping. Bioinformatics, 29, 1026–1034.CrossRefGoogle ScholarPubMed
Yin, J. and Li, H. (2011). A sparse conditional gaussian graphical model for analysis of genetical genomics data. Ann. Appl. Statist., 5, 2630.CrossRefGoogle ScholarPubMed
Zerbino, D.R. and Birney, E. (2008). Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res., 18, 821–829.CrossRefGoogle Scholar
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Statist. Soc. Ser. B (Statist. Method.), 67, 301–320.Google Scholar
Zou, W., Aylor, D.L. and Zeng, Z.-B. (2007). eQTL viewer: visualizing how sequence variation affects genome-wide transcription. BMC Bioinform., 8, 7.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×