Multivariate phase-type theory for the site frequency spectrum

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Multivariate phase-type theory for the site frequency spectrum. / Hobolth, Asger; Bladt, Mogens; Andersen, Lars Nørvang.

I: Journal of Mathematical Biology, Bind 83, Nr. 6-7, 63, 2021.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Hobolth, A, Bladt, M & Andersen, LN 2021, 'Multivariate phase-type theory for the site frequency spectrum', Journal of Mathematical Biology, bind 83, nr. 6-7, 63. https://doi.org/10.1007/s00285-021-01689-w

APA

Hobolth, A., Bladt, M., & Andersen, L. N. (2021). Multivariate phase-type theory for the site frequency spectrum. Journal of Mathematical Biology, 83(6-7), [63]. https://doi.org/10.1007/s00285-021-01689-w

Vancouver

Hobolth A, Bladt M, Andersen LN. Multivariate phase-type theory for the site frequency spectrum. Journal of Mathematical Biology. 2021;83(6-7). 63. https://doi.org/10.1007/s00285-021-01689-w

Author

Hobolth, Asger ; Bladt, Mogens ; Andersen, Lars Nørvang. / Multivariate phase-type theory for the site frequency spectrum. I: Journal of Mathematical Biology. 2021 ; Bind 83, Nr. 6-7.

Bibtex

@article{e74c7db8fa9748369d9cc1252cf5e51c,
title = "Multivariate phase-type theory for the site frequency spectrum",
abstract = "Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima{\textquoteright}s D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package PhaseTypeR, and R code for the reproduction of our results is available as an accompanying vignette.",
keywords = "Coalescent theory, Mutation rate, Phase-type distribution, Site frequency spectrum",
author = "Asger Hobolth and Mogens Bladt and Andersen, {Lars N{\o}rvang}",
year = "2021",
doi = "10.1007/s00285-021-01689-w",
language = "English",
volume = "83",
journal = "Journal of Mathematical Biology",
issn = "0303-6812",
publisher = "Springer",
number = "6-7",

}

RIS

TY - JOUR

T1 - Multivariate phase-type theory for the site frequency spectrum

AU - Hobolth, Asger

AU - Bladt, Mogens

AU - Andersen, Lars Nørvang

PY - 2021

Y1 - 2021

N2 - Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima’s D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package PhaseTypeR, and R code for the reproduction of our results is available as an accompanying vignette.

AB - Linear functions of the site frequency spectrum (SFS) play a major role for understanding and investigating genetic diversity. Estimators of the mutation rate (e.g. based on the total number of segregating sites or average of the pairwise differences) and tests for neutrality (e.g. Tajima’s D) are perhaps the most well-known examples. The distribution of linear functions of the SFS is important for constructing confidence intervals for the estimators, and to determine significance thresholds for neutrality tests. These distributions are often approximated using simulation procedures. In this paper we use multivariate phase-type theory to specify, characterize and calculate the distribution of linear functions of the site frequency spectrum. In particular, we show that many of the classical estimators of the mutation rate are distributed according to a discrete phase-type distribution. Neutrality tests, however, are generally not discrete phase-type distributed. For neutrality tests we derive the probability generating function using continuous multivariate phase-type theory, and numerically invert the function to obtain the distribution. A main result is an analytically tractable formula for the probability generating function of the SFS. Software implementation of the phase-type methodology is available in the R package PhaseTypeR, and R code for the reproduction of our results is available as an accompanying vignette.

KW - Coalescent theory

KW - Mutation rate

KW - Phase-type distribution

KW - Site frequency spectrum

UR - http://www.scopus.com/inward/record.url?scp=85119125733&partnerID=8YFLogxK

U2 - 10.1007/s00285-021-01689-w

DO - 10.1007/s00285-021-01689-w

M3 - Journal article

C2 - 34783900

AN - SCOPUS:85119125733

VL - 83

JO - Journal of Mathematical Biology

JF - Journal of Mathematical Biology

SN - 0303-6812

IS - 6-7

M1 - 63

ER -

ID: 285525155