Finding the distribution of the site frequencies, the number of segregating sites and the tail statistics using phase-type distributions.
SiteFrequencies(n, lambda, i = NULL, nSegSites = FALSE, tailStat = FALSE)
n | the sample size (>=3) |
---|---|
lambda | the non-negative mutation rate |
i | either the number of the site frequency that should be considered or the number of the first term of the tail statistic. In both cases \(1 <= i <= n-1\). |
nSegSites | a logical value indicating whether the function should compute the distribution of the number of segregating sites \((S_{Total} = \xi_1 + ... + \xi_{n-1})\). If TRUE, any value of \(i\) will be ignored. Defaults to FALSE. |
tailStat | a logical value indicating whether the function should compute the distribution of the tail statistic \(( S_{i+} = \xi_i +...+ \xi_{n-1})\). If TRUE, \(i\) will determine the first term of this statistic. Defaults to FALSE. |
Asger Hobolth, Arno Siri-Jégousse, Mogens Bladt (2019): Phase-type distributions in population genetics. Theoretical Population Biology, 127, pp. 16-32.
If nSegSites = FALSE
and tailStat= FALSE
, the function
returns the phase-type representation of the \(i\)'th site frequency \((\xi_i)\) plus one. If nSegSites = TRUE
,
the function returns the phase-type representation of the total number of segregating sites plus one, and if
tailStat= TRUE
, the representation of the tail statistic (which first term
is determined by \(i\)) plus one is returned.
In all three cases, the returned object is of type discphasetype
.
This function can be used to compute the discrete phase-type representation of the site frequencies
\(\xi_i + 1\), for all \(i\) in eqn1,...,n-1, the total number of segregating sites \(S_{Total} + 1\)
and the tail statistic \(S_{i+} + 1\).
The reason for adding one to the site frequency is that the support for
discrete phase-type distributions is on the natural numbers excluding zero.
Hence, immediate absorption would not be possible. By adding one, we allow
the site frequency to be zero.
Note that the package does also include the function dSegregatingSites
,
which computes the density function of the number of segregating sites
for a given sample size \(n\), a mutation parameter \(\theta\) and
a non-negative vector of quantiles \(k\).
SiteFrequencies(n=4, lambda=1, i=2)#> $initDist #> [,1] [,2] #> [1,] 1 0 #> #> $P_Mat #> [,1] [,2] #> [1,] 0.25 0.1666667 #> [2,] 0.00 0.6666667 #> #> attr(,"class") #> [1] "discphasetype"SiteFrequencies(n=4, lambda=1, nSegSites=TRUE)#> $initDist #> [1] 1 0 0 0 #> #> $P_Mat #> [,1] [,2] [,3] [,4] #> [1,] 0.4 0.3 0.1333333 0.06666667 #> [2,] 0.0 0.5 0.2222222 0.11111111 #> [3,] 0.0 0.0 0.6666667 0.00000000 #> [4,] 0.0 0.0 0.0000000 0.66666667 #> #> attr(,"class") #> [1] "discphasetype"SiteFrequencies(n=4, lambda=1, i=2, tailStat=TRUE)#> $initDist #> [,1] [,2] [,3] #> [1,] 1 0 0 #> #> $P_Mat #> [,1] [,2] [,3] #> [1,] 0.25 0.25 0.1666667 #> [2,] 0.00 0.50 0.0000000 #> [3,] 0.00 0.00 0.6666667 #> #> attr(,"class") #> [1] "discphasetype"