Skip to main content Accessibility help
  • Print publication year: 2015
  • Online publication date: July 2015

6 - Joining the dots: network analysis of gene perturbation data


How to link genotypes and phenotypes is a long-standing question in modern biology. Modern high-throughput approaches are key technologies at the forefront of genetic research. They enable the analysis of a biological response to thousands of experimental perturbations and require a tight collaboration between experimental and computational scientists. Perturbation studies and computational approaches have revolutionized research in functional genomics and genetics and promise to lay the foundation for personalized medicine. For modern high-throughput technologies, computation is as important as experimentation. Genome-wide image-based RNA interference (RNAi) screens, for example, are only feasible because of computational techniques. Computational skills to analyse the data have become as important as experimental skills to generate the data.

Design and analysis of phenol typing screens depend on the number of genes perturbed and the richness of the phenotype observed (Figure 6.1). At one extreme are high-throughput screens with single reporters, e.g. a genome-wide screen for new components of a pathway. At the other extreme are perturbations of individual genes with very rich phenotypes, e.g. assessing the effects of a single gene perturbation on several molecular levels over time. Between these two extremes lie a variety of possible screen designs. Two widely used scenarios are small-scale perturbations (<20 genes) of a single target pathway with rich readouts, e.g. a global transcriptional profile, and medium-scale perturbations (hundreds of genes) with multi-parametric readouts, e.g. cell morphology or growth in different media. In the following we will discuss statistical and computational methodologies for functional analysis in all four scenarios.

Scenario 1: Genome-wide screens with single reporters

RNAi screens have been frequently and successfully applied for functional profiling of genes on a large scale (Boutros & Ahringer 2008). The vast majority of these applications use a single phenotype (e.g. cell viability, growth rate, activity of reporter constructs) to characterize the function of genes in specific biological pathways.

Ahmed, A. & Xing, E. P. (2009), #x2018;Recovering time-varying networks of dependencies in social and biological studies’, Proceedings of the National Academy of Sciences of the USA 106(29), 11 878–11 883.
Alexa, A., Rahnenfuhrer, J. & Lengauer, T. (2006), ‘Improved scoring of functional groups from gene expression data by decorrelating go graph structure’, Bioinformatics 22(13), 1600–1607.
Anchang, B., Sadeh, M., Jacob, J., Tresch, A., Vlad, M. et al. (2009), ‘Modeling the temporal interplay of molecular signaling and gene expression by using dynamic nested effects models’, Proceedings of the National Academy of Sciences of the USA 106(16), 6447–6452.
Arora, S., Gonzales, I., Hagelstrom, R., Beaudry, C., Choudhary, A. et al. (2010), ‘RNAipheno-type profiling of kinases identifies potential therapeutic targets in Ewing's sarcoma’, Molecular Cancer 9(1), 218.
Ashburner, M., Ball, C., Blake, J., Botstein, D., Butler, H. et al. (2000), ‘Gene Ontology: tool for the unification of biology’, Nature Genetics 25(1), 25–29.
Bakal, C., Aach, J., Church, G. & Perrimon, N. (2007), ‘Quantitative morphological signatures define local signaling networks regulating cell morphology’, Science 316(5832), 1753–1756.
Baryshnikova, A., Costanzo, M., Kim, Y., Ding, H., Koh, J. et al. (2010), ‘Quantitative analysis of fitness and genetic interactions in yeast on a genome scale’, Nature Methods 7(12), 1017–1024.
Battle, A., Jonikas, M. C., Walter, P., Weissman, J. S. & Koller, D. (2010), ‘Automated identification of pathways from quantitative genetic interaction data’, Molecular Systems Biology 6, 379.
Bauer, S., Grossmann, S., Vingron, M. & Robinson, P. (2008), ‘Ontologizer 2.0: a multifunctional tool for GO term enrichment analysis and data exploration’, Bioinformatics 24(14), 1650–1651.
Beiβbarth, T. & Speed, T. (2004), ‘GOstat: find statistically overrepresented Gene Ontologies within a group of genes’, Bioinformatics 20(9), 1464–1465.
Beisser, D., Klau, G., Dandekar, T., Muller, T. & Dittrich, M. (2010), ‘BioNet: an R-package for the functional analysis of biological networks’, Bioinformatics 26(8), 1129–1130.
Birmingham, A., Selfors, L., Forster, T., Wrobel, D., Kennedy, C. et al. (2009), ‘Statistical methods for analysis of high-throughput RNA interference screens’, Nature Methods 6(8), 569–575.
Booker, M., Samsonova, A. A., Kwon, Y., Flockhart, I., Mohr, S. E. et al. (2011), ‘False negative rates in Drosophila cell-based RNAi screens: a case study’, BMC Genomics 12, 50.
Boutros, M. & Ahringer, J. (2008), ‘The art and design of genetic screens: RNA interference’, Nature Reviews Genetics 9(7), 554–566.
Boutros, M., Brás, L. P. & Huber, W. (2006), ‘Analysis of cell-based RNAi screens’, Genome Biology 7(7), R66.
Boutros, M., Kiger, A. A., Armknecht, S., Kerr, K., Hild, M. et al. (2004), ‘Genome-wide RNAi analysis ofgrowth and viability in Drosophila cells’, Science 303(5659), 832–835.
Breitkreutz, B., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A. et al. (2008), ‘The BioGRID interaction database: 2008 update’, Nucleic Acids Research 36 (Suppl 1), D637–D640.
Brideau, C., Gunter, B., Pikounis, B. & Liaw, A. (2003), ‘Improved statistical methods for hit selection in high-throughput screening’, Journal of Biomolecular Screening 8(6), 634–647.
Castro, M., Wang, X., Fletcher, M., Meyer, K. & Markowetz, F. (2012), ‘RedeR: R/Bioconductor package for representing modular structures, nested networks and multiple levels of hierarchical associations’, Genome Biology 13(4), R29.
Cheung, H. W., Cowley, G. S., Weir, B. A., Boehm, J. S., Rusin, S. et al. (2011), ‘Systematic investigation of genetic vulnerabilities across cancer cell lines reveals lineage-specific depen-dencies in ovarian cancer’, Proceedings ofthe National Academy ofSciences ofthe USA 108(30), 12372–12377.
Collins, S., Miller, K., Maas, N., Roguev, A., Fillingham, J. et al. (2007), ‘Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map’, Nature 446(7137), 806–810.
Costanzo, M., Baryshnikova, A., Bellay, J., Kim, Y., Spear, E. et al. (2010), ‘The genetic landscape of a cell’, Science 327(5964), 425.
Dadgostar, H., Zarnegar, B., Hoffmann, A., Qin, X., Truong, U. et al. (2002), ‘Cooperation of multiple signaling pathways in CD40-regulated gene expression in B lymphocytes’, Proceedings ofthe National Academy ofSciences ofthe USA 99(3), 1497–1502.
de Hoon, M., Imoto, S. & Miyano, S. (2002), ‘A comparison of clustering techniques for gene expression data’, Proceedings of the 10th International Conference on Intelligent Systems for Molecular Biology, Abstract 33A.
Dempster, A., Laird, N. & Rubin, D. (1977), ‘Maximum likelihood from incomplete data via the EM algorithm’, Journal ofthe Royal Statistical Society, Series B (Methodological) 39(1), 1–38.
Echeverri, C. & Perrimon, N. (2006), ‘High-throughput RNAi screening in cultured cells: a user's guide’, Nature Reviews Genetics 7(5), 373–384.
Echeverri, C., Beachy, P., Baum, B., Boutros, M., Buchholz, F. et al. (2006), ‘Minimizing the risk of reporting false positives in large-scale RNAi screens’, Nature Methods 3(10), 777–779.
Eisen, M., Spellman, P., Brown, P. & Botstein, D. (1998), ‘Cluster analysis and display of genome-wide expression patterns’, Proceedings ofthe National Academy of Sciences of the USA 95(25), 14863–14 868.
Failmezger, H., Praveen, P., Tresch, A. & Frohlich, H. (2013), ‘Learning gene network structure from time lapse cell imaging in RNAi knockdowns’, Bioinformatics 29(12), 1534–1540.
Falcon, S. & Gentleman, R. (2007), ‘Using GOstats to test gene lists for GO term association’, Bioinformatics 23(2), 257–258.
Farha, M. & Brown, E. (2010), ‘Chemical probes of Escherichia coli uncovered through chemical-chemical interaction profiling with compounds of known biological activity’, Chem-istry & Biology 17(8), 852–862.
Friedman, N. (2004), ‘Inferring cellular networks using probabilistic graphical models’, Science 303(5659), 799–805.
Friedman, N., Linial, M., Nachman, I. & Pe'er, D. (2000), ‘Using Bayesian networks to analyze expression data’, Journal of Computational Biology 7(3–4), 601–620.
Fröhlich, H., BeiBbarth, T., Tresch, A., Kostka, D., Jacob, J. et al. (2008a), ‘Analyzing gene perturbation screens with nested effects models in R and Bioconductor’, Bioinformatics 24(21), 2549–2550.
Fröhlich, H., Fellmann, M., Sueltmann, H., Poustka, A. & Beissbarth, T. (2007), ‘Large scale statistical inference of signaling pathways from RNAi and microarray data’, BMC Bioinformatics 8, 386.
Fröhlich, H., Fellmann, M., Sueltmann, H., Poustka, A. & Beissbarth, T. (2008fc), ‘Estimating large-scale signaling networks through nested effect models with intervention effects from microarray data’, Bioinformatics 24(22), 2650–2656.
Fröhlich, H., Praveen, P. & Tresch, A. (2011), ‘Fast and efficient dynamic nested effects models’, Bioinformatics 27(2), 238–244.
Fuchs, F., Pau, G., Kranz, D., Sklyar, O., Budjan, C. et al. (2010), ‘Clustering phenotype populations by genome-wide RNAi and multiparametric imaging’, Molecular Systems Biology 6, 370.
Geyer, C. (2010), ‘Introduction to Markov chain Monte Carlo’, in S., Brooks, A., Gelman, G., Jones & X.-L., Meng eds., Handbook of Markov chain Monte Carlo, CRC Press, Boca Raton, FL, pp. 3–48.
Green, R., Kao, H., Audhya, A., Arur, S., Mayers, J. et al. (2011), ‘A high-resolution C. ele-gans essential gene network based on phenotypic profiling of a complex tissue’, Cell 145(3), 470–482.
Hahne, F., Arlt, D., Sauermann, M., Majety, M., Poustka, A. et al. (2006), ‘Statistical methods and software for the analysis of high-throughput reverse genetic assays using flow cytometry readouts’, Genome Biology 7(8), R77.
Horn, T., Sandmann, T., Fischer, B., Axelsson, E., Huber, W. et al. (2011), ‘Mapping of signaling networks through synthetic genetic interaction analysis by RNAi’, Nature Methods8(4), 341–346.
House, C. D., Vaske, C. J., Schwartz, A. M., Obias, V., Frank, B. et al. (2010), ‘Voltage-gated Na+ channel SCN5A is a key regulator of a gene transcriptional network that controls colon cancer invasion’, Cancer Research 70(17), 6957–6967.
Jeffreys, H. (1998), Theory of probability, 3rd edn, Oxford University Press.
Jensen, L. J., Kuhn, M., Stark, M., Chaffron, S., Creevey, C. et al. (2009), ‘String 8: a global view on proteins and their functional interactions in 630 organisms’, Nucleic Acids Research 37(Database issue), D412–D416.
Kaderali, L., Dazert, E., Zeuge, U., Frese, M. & Bartenschlager, R. (2009), ‘Reconstructing signaling pathways from RNAi data using probabilistic Boolean threshold networks’, Bioinformatics 25(17), 2229–2235.
Kessler, J., Kahle, K., Sun, T., Meerbrey, K., Schlabach, M. et al. (2012), ‘A SUMOylation-dependent transcriptional subprogram is required for Myc-driven tumorigenesis’, Science 335(6066), 348–353.
Li, C. & Wong, W. (2001), ‘Model-based analysis of oligonucleotide arrays: model validation, design issues and standard error application’, Genome Biology 2(8), 0032.
Liberzon, A., Subramanian, A., Pinchback, R., Thorvaldsdottir, H., Tamayo, P. et al. (2011), ‘Molecular signatures database (MSigDB) 3.0’, Bioinformatics 27(12), 1739–1740.
Lu, R., Markowetz, F., Unwin, R. D., Leek, J. T., Airoldi, E. M. et al. (2009), ‘Systems level dynamic analyses of fate change in murine embryonic stem cells’, Nature 462(7271), 358–362.
Maathuis, M. H., Colombo, D., Kalisch, M. & Bhlmann, P. (2010),‘Predicting causal effects in large-scale systems from observational data’, Nature Methods 7(4), 247–248.
Madigan, D., York, J. & Allard, D. (1995), ‘Bayesian graphical models for discrete data’, International Statistical Review/Revue Internationale de Statistique 63(2), 215–232.
Malo, N., Hanley, J., Cerquozzi, S., Pelletier, J. & Nadon, R. (2006), ‘Statistical practice in high-throughput screening data analysis’, Nature Biotechnology 24(2), 167–175.
Mani, R., St Onge, R., Hartman, J., Giaever, G. & Roth, F. (2008), ‘Defining genetic interaction’, Proceedings of the National Academy of Sciences of the USA 105(9), 3461–3466.
Markowetz, F. (2010), ‘How to understand the cell by breaking it: network analysis of gene perturbation screens’, PLoS Computational Biology 6(2), e1000655.
Markowetz, F. & Spang, R. (2007), ‘Inferring cellular networks - a review’, BMC Bioinformatics 8(Suppl6), S5.
Markowetz, F., Bloch, J. & Spang, R. (2005a), ‘Non-transcriptional pathway features recon-structed fromsecondary effects of RNA interference’, Bioinformatics 21(21), 4026–4032.
Markowetz, F., Grossmann, S. & Spang, R. (2005b), ‘Probabilistic soft interventions in conditional Gaussian networks’, Proceedings of 10th International Workshop on Artificial Intelligence and Statistics.
Markowetz, F., Kostka, D., Troyanskaya, O. G. & Spang, R. (2007), ‘Nested effects models for high-dimensional phenotyping screens’, Bioinformatics 23(13), i305–i312.
Markowetz, F., Mulder, K. W., Airoldi, E. M., Lemischka, I. R. & Troyanskaya, O. G. (2010), ‘Mapping dynamic histone acetylation patterns to gene expression in nanog-depleted murine embryonic stem cells’, PLoS Computational Biology 6(12), e1001034.
Merico, D., Isserlin, R., Stueker, O., Emili, A. & Bader, G. (2010), ‘Enrichment map: a network-based method for gene-set enrichment visualization and interpretation’, PLoS One 5(11), e13984.
Mulder, K. W., Wang, X., Escriu, C., Ito, Y., Schwarz, R. F. et al. (2012), ‘Diverse epigenetic strategies interact to control epidermal differentiation’, Nature Cell Biology 14 (7), 753–763.
Müller, P., Kuttenkeuler, D., Gesellchen, V., Zeidler, M. P. & Boutros, M. (2005), ‘Identification of JAK/STAT signalling components by genome-wide RNA interference’, Nature 436(7052), 871–875.
Murphy, K. (2002), ‘Dynamic Bayesian networks: representation, inference and learning’, PhD thesis, University of California - Berkeley.
Niederberger, T., Etzold, S., Lidschreiber, M., Maier, K., Martin, D. et al. (2012), ‘MC EMiNEM maps the interaction landscape of the mediator’, PLoS Computational Biology 8(6), e1002568.
Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. et al. (1999), ‘KEGG: Kyoto encyclopedia of genes and genomes’, Nucleic Acids Research 27(1), 29–34.
Orvedahl, A., Sumpter Jr, R., Xiao, G., Ng, A., Zou, Z. et al. (2011), ‘Image-based genome-wide siRNA screen identifies selective autophagy factors’, Nature 480(7375), 113–117.
Pearl, J. (1988), Probabilistic reasoning in intelligent systems: networks of plausible inference,Morgan Kaufmann, San Mateo, CA.
Pearl, J. (2000), Causality: models, reasoning, and inference, Cambridge University Press.
Pe'er, D. (2005), ‘Bayesian network analysis of signaling networks: a primer’, Science STKE 2005(281), l4.
Pe'er, D., Regev, A., Elidan, G. & Friedman, N. (2001), ‘Inferring subnetworks from perturbed expression profiles’, Bioinformatics 17(Suppl 1), S215–S224.
Pelz, O., Gilsdorf, M. & Boutros, M. (2010), ‘web-cellHTS2: a web-application for the analysis of high-throughput screening data’, BMC Bioinformatics 11(1), 185.
Rung, J., Schlitt, T., Brazma, A., Freivalds, K. & Vilo, J. (2002), ‘Building and analysing genome-wide gene disruption networks’, Bioinformatics 18(Suppl 2), S202–S210.
Sachs, K., Perez, O., Pe'er, D., Lauffenburger, D. A. & Nolan, G. P. (2005), ‘Causal protein-signaling networks derived from multiparameter single-cell data’, Science 308(5721), 523–529.
Shimoni, Y., Fink, M. Y., Choi, S.-G. & Sealfon, S. C. (2010), ‘Plato's cave algorithm: inferring functional signaling networks from early gene expression shadows’, PLoS Computational Biology 6(6), e1000828.
Smyth, G. K. (2005), ‘Limma: linear models for microarray data’, in R., GentlemanV., CareyS., DudoitR., Irizarry & W., Huber eds., Bioinformatics and computational biology solutions using R and Bioconductor', Springer, New York pp. 397–420.
Song, L., Kolar, M. & Xing, E. P. (2009), ‘Time-varying dynamic Bayesian networks’, Advances in Neural Information Processing Systems 22, 1732–1740.
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L. et al. (2005), ‘Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles’, Proceedings of the National Academy of Sciences of the USA 102(43), 15 545–15 550.
Suzuki, R. & Shimodaira, H. (2006), ‘Pvclust: an R package for assessing the uncertainty in hierarchical clustering’, Bioinformatics 22(12), 1540.
Tong, A., Lesage, G., Bader, G., Ding, H., Xu, H. et al. (2004), ‘Global mapping of the yeast genetic interaction network’, Science 303 (5659), 808.
Tresch, A. & Markowetz, F. (2008), ‘Structure learning in nested effects models’, Statistical Applications in Genetics and Molecular Biology 7(1), 9.
Vaske, C. J., House, C., Luu, T., Frank, B., Yeang, C.-H. et al. (2009), ‘A factor graph nested effects model to identify networks from genetic perturbations’, PLoS Computational Biology 5(1), e1000274.
Wagner, A. (2001), ‘How to reconstruct a large genetic network from n gene perturbations in fewer than n2 easy steps’, Bioinformatics 17(12), 1183–1197.
Wang, X., Castro, M. A., Mulder, K. W. & Markowetz, F. (2012), ‘Posterior association networks and functional modules inferred from rich phenotypes of gene perturbations’, PLoS Computational Biology 8(6), e1002566.
Wang, X., Terfve, C., Rose, J. C. & Markowetz, F. (2011), ‘HTSanalyzeR: an R/Bioconductor package for integrated network analysis of high-throughput screens’, Bioinformatics 27(6), 879–880.
Wang, X., Yuan, K., Hellmayr, C., Liu, W. & Markowetz, F. (2014), ‘Reconstructing evolving signalling networks by hidden Markov nested effects models’, Annals of Applied Statistics 8(1), 448–480.
Zhang, J., Chung, T. & Oldenburg, K. (1999), ‘A simple statistical parameter for use in evaluation and validation of high throughput screening assays’, Journal ofBiomolecular Screening 4(2), 67–73.
Zhang, X., Yang, X., Chung, N., Gates, A., Stec, E. et al. (2006), ‘Robust statistical methods for hit selection in RNA interference high-throughput screening experiments’, Pharmacogenomics 7(3), 299–309.