When the key biological molecules, proteins and nucleic acids, were first studied, they were characterized by their sizes, compositions and, for proteins, their antigenicity. These simple characters were often used to infer the relationships of the organisms from which the molecules were obtained, and the results seemed to be sensible. However, it was not until the sequences of the amino acids in the proteins were determined, and more recently, the nucleotides in the nucleic acids, that the molecular basis of the relationships were shown to reside in the sequences and their three-dimensional structures.
In this chapter we outline some of the methods used for inferring the phylogenetic or other relationships of sequences. We distinguish the various components of analysis; the ways that nucleotide and amino acid sequences are recorded for analysis, the means by which that data can be transformed to improve the chances of discovering the true relationships, and the ways by which the relationships are inferred and displayed. We have omitted to discuss the more traditional ways for indirectly comparing sequences using, for example, their composition, RFLP similarity or nearest neighbour frequency. However, it is worth emphasizing that, if such methods indicate relationships that correlate with those inferred from sequence data, then a much larger body of older, simpler data may become available and permit broader comparisons and extrapolations.
Aims of sequence analysis
The most appropriate way to analyse a sequence depends on the goal of the analysis.