INTRODUCTION
Over the last decade, we have witnessed a massive expansion in demand for and access to low-cost high-throughput sequencing of nucleic acids, which can be predominantly attributed to the advent and establishment of so-called next-generation sequencing technologies (NGS; see Section 20.2.2). The availability and progressively decreasing costs of such technologies has been accompanied by an ever-increasing number of nucleic acid and protein sequences being deposited in public repositories and, in turn, by the need to draw biologically meaningful information or interpretations from these data. As a consequence, the discipline of bioinformatics has become instrumental in many areas of biology and, in particular, molecular biology.
Bioinformatics can be defined as a ‘fusion’ of biology and informatics, which includes applied mathematics, computer sciences, information technology and statistics. This multi-disciplinary field of research includes two major components, one aimed at developing computational tools and algorithms to facilitate storage, analysis and manipulation of sequence data, and one aimed at applying such tools to the discovery of new biological insights on the organism(s) under consideration. Researchers involved in the field of bioinformatics comprise both algorithmor software- developers and end-users. While the main interest of the first group lies in writing sequence analysis programs and tools (programming, often called ‘ coding’), the second wishes to apply these tools to answer questions of biological relevance. Within this latter group, experienced end-users often download and maintain programs on their private personal computers or servers, analyse a number of sequences (thousands to millions) simultaneously, have a working knowledge of programming languages and are therefore skilled in the use of command-linebased software. On the other hand, occasional users mainly deal with a limited number of sequences and thus prefer the use of ‘user-friendly’, web-server-based tools, which often offer a reduced set of options and a limited capacity when compared with the corresponding downloadable software packages. This chapter intends to provide an overview of the basic methods and bioinformatics resources available for the analysis of nucleic acids and protein sequences, and is primarily addressed to occasional users. For details on bioinformatic analyses of largescale sequence datasets, such as those generated by NGS technologies, the reader is referred to Chapter 20, while programmatic or script access to programs will require more advanced programming skills, for example using the Python language (see Chapter 18).