To study life at the molecular level, various types of data must be in a ‘human readable format’. There are many tools and techniques with which we can determine the sequences of the nucleic acids and proteins that allow us to draw parallels between what we physically observe and what is happening at the molecular level. The amount of data these tools generate is vast. The data can amount to several gigabytes per experiment, and all must be correctly stored and interpreted in order to be useful. There are also several related types of information that assist in the better understanding of the function of genes and proteins in the organism.
Bioinformatics is the discipline that deals with the problem of massive data sets with diverse data in molecular research. It concerns the storage, visualisation, manipulation, analysis and integration of biological information using computers. Initially, the biological information that was required to be managed by bioinformaticists was the nucleotide and amino acid data derived from DNA and protein sequencing, but today with new technologies and approaches there are many more diverse data types that must be managed.
A paradigm shift is starting to occur in molecular research. The emphasis on hypo - thesis-driven research is declining, and is shifting towards exploratory observations. These observations are typically done in a very high-throughput way, meaning that in some cases literally billions of data points are considered per experiment over a relatively short period of time. This would be impossible if all the observations had to be made by human observers, and computers are extremely important for data capturing, management and analysis in experiments of this type.
A very good example of technology that requires computers for data capture, management and analysis is microarray technology. This technology enables the determination of miRNA expression levels for all genes in a sample at a given time point. It is utilised by com paring the gene expression levels between a test and a reference sample, and then identifying the genes that are up-regulated in terms of their expression as well as those that are down-regulated. These genes are very interesting to examine, since (usually in combination) they determine how the phenotype of a test sample would differ from the reference sample.