Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-nmvwc Total loading time: 0 Render date: 2024-06-25T17:01:43.251Z Has data issue: false hasContentIssue false

3 - Statistical analysis of gene expression data

Published online by Cambridge University Press:  05 September 2009

David A. Elashoff
Affiliation:
Department of Biostatistics, UCLA School of Public Health, Los Angeles, CA, USA
Wolf-Karsten Hofmann
Affiliation:
Charite-University Hospital Benjamin Franklin, Berlin
Get access

Summary

Abstract

Statistical analysis of the complex data sets produced in DNA microarray experiments presents substantial challenges to the experimenter and statistician alike. Due to the large number of genes and small number of samples, traditional statistical analysis methods alone are not typically sufficient to make appropriate conclusions. This chapter introduces the reader to the basic concepts in the analysis of microarray data and provides a summary of some of the most commonly used techniques. The overall structure of a microarray data analysis can be divided into four distinct components. The four components of a microarray data analysis consist of data preprocessing/quality control, identification of differentially expressed genes, unsupervised clustering/data visualization, and supervised classification/prediction. As the science of microarray analysis has advanced, a wide variety of methods have been developed to address each of these components. Guidance is provided as to the situations in which the various techniques can be applied most productively and cautions given about cases where these techniques will give inappropriate answers.

Introduction

The growth of microarray research has resulted in considerable interest in the statistical and computational communities in the development of methods for addressing these problems. The most common scientific questions asked in a microarray experiment are, “What genes are correlated with specific characteristics of the samples?” and “Are there specific patterns of gene expression, or combinations of multiple genes, which can accurately predict the sample characteristics?”

Type
Chapter
Information
Gene Expression Profiling by Microarrays
Clinical Implications
, pp. 47 - 79
Publisher: Cambridge University Press
Print publication year: 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Yang, Y. H., Buckley, M. J., Dudoit, S., and Speed, T. P. Comparison of methods for image analysis on cDNA microarray data. UC Berkeley Technical Report 584, November 2000.
Yang, Y. H., Dudoit, S., Luu P., and Speed, T. P. Normalization for cDNA microarray data. UC Berkeley Technical Report, December 2000.
Affymetrix. Affymetrix Microarray Suite User Guide, Version 4 edn. Affymetrix Santa Clara, CA, 1999.
Affymetrix. Statistical Algorithms Description Document. Affymetrix, Santa Clara, CA, 2002.
Li, C. and Wong, W. H. (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc. Natl Acad. Sci. 2001; 98: 31.CrossRefGoogle ScholarPubMed
Irizarry, R., Bolstad, B., Collin, F., Cope, L., Hobbs, B. and Speed, T. (2003) Summaries of Affymetrix GeneChip probe level data. Nucl. Acids Res. 31(4): e15.CrossRefGoogle ScholarPubMed
Zhang, L., Miles, M., and Aldape, K.A model of molecular interactions on short oligonucleotide microarrays. Nat. Biotechnol. 2003; 21(7): 818–21.CrossRefGoogle ScholarPubMed
Wu, Z., Irizarry, R., Gentleman, R., Murillo, F., and Spencer, F.A model based background adjustment for oligonucleotide expression arrays. J. Am. Statist. Assoc. 2005; 99(468): 909–17.CrossRefGoogle Scholar
Shedden, K., Chen, W., Kuick, R.et al. Comparison of seven methods for producing Affymetrix expression scores based on False Discovery Rates in disease profiling dataBMC Bioinformatics 2005; 6: 26.CrossRefGoogle ScholarPubMed
Elashoff, D., Oh, M., Brown, N., Li, Y., Wong, D. T., and Horvath, S. Empirical study of the influence of expression index on the standard statistical analysis of oligonucleotide microarray data. Manuscript under review.
Rosati, B., Frau, F., Kuehler, A., Rodriguea, S., and Mckinnon, D.Comparison of different probe-level analysis techniques for oligonucleotide microarrays. BioTechniques 2004; 36(2): 316–22.Google ScholarPubMed
Cole, S. W., Galic, Z., and Zack, J. A.Controlling false-negative errors in microarray differential expression analysis: a PRIM approach. Bioinformatics, 2003; 19(14): 1808–16.CrossRefGoogle ScholarPubMed
Dudoit, S., Yang, Y. H., Callow, M. J., and Speed, T. P. Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. UC Berkeley Technical Report 578, 2000.
Benjamini, Y. and Hochberg, Y.Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc., Series B 1995; 85: 289–300.Google Scholar
Tusher, V., Tibshirani, R., and Chu, G.Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl Acad. Sci. USA 2001; 98(9): 5116–21.CrossRefGoogle ScholarPubMed
Hubert, L. and Arabie, P.Comparing partitions. J. Classifications 1985; 2: 194–218.Google Scholar
Tamayo, P., Slonim, D., Mesirov, J.et al. Interpreting patterns of gene expression with self-organizing maps: methods and application to hemato poietic differentiation. Proc. Natl Acad. Sci. USA 1999; 96: 2907–12.CrossRefGoogle Scholar
Golub, T. R., Slonim, D. K., Tamayo, P.et al. Molecular classification of cancer: class prediction by gene expression monitoring. Science 1999; 286: 531–7.CrossRefGoogle ScholarPubMed
Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, D., and Brown, P. Clustering methods for the analysis of DNA microarray data. Stanford Tech. Report October 1999.
Lazzeroni, L. and Owen, A. B. Plaid models for gene expression data. Statist. Sinica 2002; 12: 61–86.Google Scholar
Terrin, N., Schmid, C., Griffith, J., D'Agostino, R., and Selker, H.External validity of of predictive models: a comparison of logistic regression, classification trees, and neural networks. J. Clin. Epidemiol. 2003; 56: 721–729.CrossRefGoogle ScholarPubMed
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J.Classification and Regression Trees. New York: Chapman & Hall, 1984.Google Scholar
Breiman, L. Random forests, random features. Department of Statistics, UC Berkeley Technical Report 567, 1999.
Cover, T. and Hart, P.Nearest neighbor pattern classification. IEEE Trans. Information Theory 1967; 13(1): 21–7.CrossRefGoogle Scholar
Vapnik, V. N. The nature of statistical learning theory. 2nd edn. In Statistics for Engineering and Information Science. New York: Springer, 2000, xix, 314 p.Google Scholar
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V.Gene selection for cancer classifcation using support vector machines. Machine Learning 2002; 46: 389–422.CrossRefGoogle Scholar
Kerr, M. K. and Churchill, G. A.Experimental design for gene expression microarrays. Biostatistics 2001; 2(2): 183–20.CrossRefGoogle ScholarPubMed

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×