Electronic Databases in the Life Sciences
During the past ten years, the life sciences (i.e., biology, medicine, and pharmaceutical research) have evolved from being largely low-throughput, observational disciplines to primarily high-throughput, data-driven disciplines. In other words, life sciences are becoming a “data science.” Thanks to advances in DNA sequencing, medical imaging, robotic sample handling, and high-throughput screening, it is possible to generate as much data in a day-long experiment as it might have taken for an entire scientific career. For instance, a single eight-hour sequencing run on a DNA pyrosequencer can generate enough sequence data to fill a 1,000-page book (1, 2). The resulting genome sequence could be automatically annotated in a few hours yielding an enormous volume of information that could easily occupy ten large telephone books (3, 4).
Our capacity to generate gigabytes of information on a daily basis is having a profound impact on the way that scientific information is being disseminated or delivered. Although most scientific data are still presented in scientific journals and most high-level scientific knowledge is still published in textbooks, it is becoming increasingly obvious that the paper-publishing industry cannot keep up with the pace of scientific advancement and the quantity of data that the scientific community would like to publish. Fortunately the World Wide Web (i.e., the Web) has come to the rescue. The Web makes it possible to publish and disseminate huge quantities of information quickly and inexpensively. Not only has the Web helped to save scientific publishing, it has also led to the development of a new and very important kind of scientific archive: the electronic database. Electronic databases are Web-accessible archives that contain scientifically important data that are either too voluminous to publish in a book or journal or in a format that is incompatible with paper publication. Electronic databases such as GenBank (5), the Protein Data Bank (6), or PubMed allow information to be continuously updated through the contributions of thousands of scientists or dozens of curators who continuously upload and deposit data into these resources. Electronic databases also allow their data to be searched, accessed, or displayed in ways that were simply not possible through a paper journal or a leather-bound book. Indeed, the emergence of electronic, Web-accessible databases has to be considered one of the more significant developments in the field of life sciences.