Finding the distribution of the site frequencies, the number of segregating sites and the tail statistics using phase-type distributions.

SiteFrequencies(n, lambda, i = NULL, nSegSites = FALSE,
  tailStat = FALSE)

Arguments

n

the sample size (>=3)

lambda

the non-negative mutation rate

i

either the number of the site frequency that should be considered or the number of the first term of the tail statistic. In both cases \(1 <= i <= n-1\).

nSegSites

a logical value indicating whether the function should compute the distribution of the number of segregating sites \((S_{Total} = \xi_1 + ... + \xi_{n-1})\). If TRUE, any value of \(i\) will be ignored. Defaults to FALSE.

tailStat

a logical value indicating whether the function should compute the distribution of the tail statistic \(( S_{i+} = \xi_i +...+ \xi_{n-1})\). If TRUE, \(i\) will determine the first term of this statistic. Defaults to FALSE.

Source

Asger Hobolth, Arno Siri-Jégousse, Mogens Bladt (2019): Phase-type distributions in population genetics. Theoretical Population Biology, 127, pp. 16-32.

Value

If nSegSites = FALSE and tailStat= FALSE, the function returns the phase-type representation of the \(i\)'th site frequency \((\xi_i)\) plus one. If nSegSites = TRUE, the function returns the phase-type representation of the total number of segregating sites plus one, and if tailStat= TRUE, the representation of the tail statistic (which first term is determined by \(i\)) plus one is returned. In all three cases, the returned object is of type discphasetype.

Details

This function can be used to compute the discrete phase-type representation of the site frequencies \(\xi_i + 1\), for all \(i\) in eqn1,...,n-1, the total number of segregating sites \(S_{Total} + 1\) and the tail statistic \(S_{i+} + 1\). The reason for adding one to the site frequency is that the support for discrete phase-type distributions is on the natural numbers excluding zero. Hence, immediate absorption would not be possible. By adding one, we allow the site frequency to be zero. Note that the package does also include the function dSegregatingSites, which computes the density function of the number of segregating sites for a given sample size \(n\), a mutation parameter \(\theta\) and a non-negative vector of quantiles \(k\).

See also

Examples

SiteFrequencies(n=4, lambda=1, i=2)
#> $initDist #> [,1] [,2] #> [1,] 1 0 #> #> $P_Mat #> [,1] [,2] #> [1,] 0.25 0.1666667 #> [2,] 0.00 0.6666667 #> #> attr(,"class") #> [1] "discphasetype"
SiteFrequencies(n=4, lambda=1, nSegSites=TRUE)
#> $initDist #> [1] 1 0 0 0 #> #> $P_Mat #> [,1] [,2] [,3] [,4] #> [1,] 0.4 0.3 0.1333333 0.06666667 #> [2,] 0.0 0.5 0.2222222 0.11111111 #> [3,] 0.0 0.0 0.6666667 0.00000000 #> [4,] 0.0 0.0 0.0000000 0.66666667 #> #> attr(,"class") #> [1] "discphasetype"
SiteFrequencies(n=4, lambda=1, i=2, tailStat=TRUE)
#> $initDist #> [,1] [,2] [,3] #> [1,] 1 0 0 #> #> $P_Mat #> [,1] [,2] [,3] #> [1,] 0.25 0.25 0.1666667 #> [2,] 0.00 0.50 0.0000000 #> [3,] 0.00 0.00 0.6666667 #> #> attr(,"class") #> [1] "discphasetype"