Data visualization denotes the techniques of visually presenting complex data sets to achieve goals such as displaying multiple data dimensions simultaneously, connecting related data points from data sets, or showing data distribution patterns. They are of great value for data processing, data analysis, and data presentation activities.
Genomics and functional genomics are the major driving forces for the development and utilization of visualization tools in biological fields. Following the completion of genomic sequencing projects of human and other model organisms around the beginning of this century, our knowledge of genes has jumped to the tens of thousands per species. Expression profiling microarray can generate millions of data points per experiment. The challenge of the huge data set size and the need to integrate different data sources in analyses prompted significant research and development work by both academic and industrial bioinformaticians. As a result, many visualization methods, proposals, and tools for biological data have been developed thus far. This chapter will describe the problems and solutions for the visualization of three basic and largest (thus, most challenging) genomics/functional genomics data types. More specifically, the first two sections will discuss visualization of sequence data and pathway/gene network data, which are two data types specific to genomics and other biology fields. In the third section, we will review visualization methods of numeric data, such as expression profiling data, proteomic data, and genotyping data. Most of the techniques in the section can also be applied to other areas. However, some topics, such as viewing numeric data in the context of genome or pathways, are still biology-specific.
Sequence and genomes
The genome is the complete set of genetic materials for an organism, which includes genes, regulatory and replication-related sequences, as well as non-functional intergenic regions. For most organisms other than RNA viruses, long linear or circular DNA molecules form the biochemical basis of the genome that stores all the genetic information. Visualization of the genome refers to the visual display of the DNA sequences and associated annotations. Depending on the visualization purposes, genome visualization tools can be classified into two categories: sequence viewer for visualizing sequence and annotations, and genome alignment viewer, for comparing different genomes.