Sequence length bounds for resolving a deep phylogenetic divergence

Fischer, Mareike
Steel, Mike
Journal Title
Journal ISSN
Volume Title
In evolutionary biology, genetic sequences carry with them a trace of the underlying tree that describes their evolution from a common ancestral sequence. The question of how many sequence sites are required to recover this evolutionary relationship accurately depends on the model of sequence evolution, the substitution rate, divergence times and the method used to infer phylogenetic history. A particularly challenging problem for phylogenetic methods arises when a rapid divergence event occurred in the distant past. We analyse an idealised form of this problem in which the terminal edges of a symmetric four--taxon tree are some factor ($p$) times the length of the interior edge. We determine an order $p^2$ lower bound on the growth rate for the sequence length required to resolve the tree (independent of any particular branch length). We also show that this rate of sequence length growth can be achieved by existing methods (including the simple `maximum parsimony' method), and compare these order $p^2$ bounds with an order $p$ growth rate for a model that describes low-homoplasy evolution. In the final section, we provide a generic bound on the sequence length requirement for a more general class of Markov processes.
Comment: 13 pages, 1 figure
Quantitative Biology - Populations and Evolution