Network discovery is often of primary interest in many scientific domains. It becomes much more challenging in biological domain because: (1) such networks are not directly observable in the experiments; (2) such networks are dynamic, i.e. different parts of the network are activated from time to time and from condition to condition; and (3) the increasingly available biological data are often big (volume), heterogeneous (variety), and error prone (veracity). There is an urgent need for the new methods, algorithms and tools to discover networks from big biological data. In this chapter, we make two assumptions that lead to two approaches to network discovery from big biological data. (1) The true network topology is a distribution of candidate topologies. The challenge is that an exponential number of possible topologies are computational intractable to characterize. Our strategy, i.e. gene set Gibbs sampling (GSGS), is to draw sample topologies and use them to infer the true topology – an approximate learning falling into stochastic algorithm framework. (2) The true network topology is deterministic. The challenge is the large search space, where we design an artificial intelligence algorithm, i.e. gene set simulated annealing (GSSA), to efficiently and intelligently explore the search space of network structures. We use both simulation data and real-world data to demonstrate the performance of our approaches compared to the selected competing approaches.
The past decade has witnessed a tremendous explosion in the amount of data generated through high-throughput molecular profiling technologies such as microarrays and next-generation sequencing. Big molecular profiling datasets are enabling a high-resolution view of biological systems and allowing scientists to interrogate the biomolecular activities of tens of thousands of genes simultaneously. However, challenges remain in analyzing big molecular profiling data and gaining meaningful insights into the biomolecular interaction and regulation mechanisms. These mechanisms are often understood through the inference of biological networks using computational systems biology approaches. A wide range of methods have been proposed in the literature for inferring the structure of different types of biological networks, such as gene regulatory networks, protein– protein interaction networks, and signaling networks in the form of Bayesian networks [1, 2], probabilistic Boolean networks (PBNs) [3, 4],mutual information networks [5–7], graphical Gaussian models [8–11], and other approaches [12–16].