THEORY
The problem of substitution saturation
The accuracy of phylogenetic reconstruction depends mainly on (1) the sequence quality, (2) the correct identification of homologous sites by sequence alignment, (3) regularity of the substitution processes, e.g. stationarity along different lineages, absence of heterotachy and little variation in the substitution rate over sites, (4) consistency, efficiency and little bias in the estimation method, e.g. not plagued by the long-branch attraction problem, and (5) sequence divergence, i.e. neither too conserved as to contain few substitutions nor too diverged as to experience substantial substitution saturation. This chapter deals with assessing substitution saturation with software dambe, which is a Windows program featuring a variety of analytical methods for molecular sequence analysis (Xia, 2001; Xia & Xie, 2001).
Substitution saturation decreases phylogenetic information contained in sequences, and has plagued the phylogenetic analysis involving deep branches, such as major arthropod groups (Lopez et al., 1999; Philippe & Forterre, 1999; Xia et al., 2003). In the extreme case when sequences have experienced full substitution saturation, the similarity between the sequences will depend entirely on the similarity in nucleotide frequencies (Lockhart et al., 1992; Steel et al., 1993; Xia, 2001, pp. 49--58; Xia et al., 2003), which often does not reflect phylogenetic relationships.