## The allelic partition for coalescent point processes

##### Date

##### Authors

Lambert, Amaury

##### Journal Title

##### Journal ISSN

##### Volume Title

##### Publisher

##### Abstract

##### Description

Assume that individuals alive at time $t$ in some population can be ranked in
such a way that the coalescence times between consecutive individuals are
i.i.d. The ranked sequence of these branches is called a coalescent point
process. We have shown in a previous work that splitting trees are important
instances of such populations. Here, individuals are given DNA sequences, and
for a sample of $n$ DNA sequences belonging to distinct individuals, we
consider the number $S_n$ of polymorphic sites (sites at which at least two
sequences differ), and the number $A_n$ of distinct haplotypes (sequences
differing at one site at least). It is standard to assume that mutations arrive
at constant rate (on germ lines), and never hit the same site on the DNA
sequence. We study the mutation pattern associated to coalescent point
processes under this assumption. Here, $S_n$ and $A_n$ grow linearly as $n$
grows, with explicit rate. However, when the branch lengths have infinite
expectation, $S_n$ grows more rapidly, e.g. as $n \ln(n)$ for critical
birth--death processes. Then, we study the frequency spectrum of the sample,
that is, the numbers of polymorphic sites/haplotypes carried by $k$ individuals
in the sample. These numbers are shown to grow also linearly with sample size,
and we provide simple explicit formulae for mutation frequencies and haplotype
frequencies. For critical birth--death processes, mutation frequencies are
given by the harmonic series and haplotype frequencies by Fisher logarithmic
series.

##### Keywords

Mathematics - Probability, 92D10, 60-06, 60G10, 60G51, 60G55, 60G70, 60J10, 60J80, 60J85.