Make no little planAttributed to Daniel Burnham, Chicago architect
Phylogenetic trees are getting large. Trees based on single loci have been constructed for > 100 000 taxa (Price et al. 2010), and trees based on a handful of loci for > 10 000 taxa (Goloboff et al. 2009; Smith et al. 2011a). Basic counting arguments show that the number of loci needed to reconstruct a tree accurately scales up with the number of leaves, N, in the tree (Mossel and Steel 2005, p. 400). Whether this scaling occurs at a conjectured rate of log(N), or is worse than that, the need for genome-scale datasets is likely to increase. Fortunately, the pace at which new sequence data are accumulating is extraordinary, and its revolutionary impact on systematics has been noted many times (e.g. Goldman and Yang 2008). What is perhaps more noteworthy is that taxon sampling has been keeping pace with advances in sequencing technology, so that the size of phylogenetic datasets has been steadily increasing in both dimensions. Figure 1.1 shows the expanding wave front of phylogenetic dataset size, a kind of ‘Moore's Law’ for phylogenomics. This pattern undoubtedly has its limits. Goldman and Yang (2008) documented the exponential growth in number of sequences in databases, but cautioned that molecular phylogenetic studies are accumulating at a rate that is less than exponential. This is probably due to a combination of the mean number of sequences per study increasing over time (Fig 1.1), and the inevitable increasing difficulty of obtaining samples of rare taxa. Given the ‘hollow curve’ of distribution, the fact that most species are both geographically restricted and locally uncommon (McGill 2010), it is doubtful that sampling across taxa will be able to keep up with sampling across individual genomes. Nonetheless, today ~ 19% of described biodiversity has at least one sequence in GenBank (355 000 species out of 1.9 million, as of March 2016).
There are many reasons to add genome-scale data to phylogenetic inference in local problems in the Tree of Life, or to solidify its deep backbone with a small number of exemplars, but this paper focuses on the task of building large, species rich, high-resolution phylogenies.