Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Standard

Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data. / Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders.

I: G3: Genes, Genomes, Genetics (Bethesda), Bind 8, Nr. 2, 2018, s. 551-566.

Publikation: Bidrag til tidsskrift › Tidsskriftartikel › Forskning › fagfællebedømt

Harvard

Soraggi, S, Wiuf, C & Albrechtsen, A 2018, 'Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data', G3: Genes, Genomes, Genetics (Bethesda), bind 8, nr. 2, s. 551-566. https://doi.org/10.1534/g3.117.300192

APA

Soraggi, S., Wiuf, C., & Albrechtsen, A. (2018). Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data. G3: Genes, Genomes, Genetics (Bethesda), 8(2), 551-566. https://doi.org/10.1534/g3.117.300192

Vancouver

Soraggi S, Wiuf C, Albrechtsen A. Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data. G3: Genes, Genomes, Genetics (Bethesda). 2018;8(2):551-566. https://doi.org/10.1534/g3.117.300192

Author

Soraggi, Samuele ; Wiuf, Carsten ; Albrechtsen, Anders. / Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data. I: G3: Genes, Genomes, Genetics (Bethesda). 2018 ; Bind 8, Nr. 2. s. 551-566.

Bibtex

@article{99e378fbd8d24eac8937beb6078606dc,

title = "Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data",

abstract = "The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data calling genotypes accurately is not always possible, therefore the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate.\\ We prove that the D-statistic is approximated by a standard normal. Furthermore we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low/medium sequencing depth (1-10X) and performances are as good as with perfectly called genotypes at a sequencing depth of 2X. We show the reliability of error correction on scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.",

author = "Samuele Soraggi and Carsten Wiuf and Anders Albrechtsen",

year = "2018",

doi = "10.1534/g3.117.300192",

language = "English",

volume = "8",

pages = "551--566",

journal = "G3: Genes, Genomes, Genetics (Bethesda)",

issn = "2160-1836",

publisher = "Genetics Society of America",

number = "2",

}

RIS

TY - JOUR

T1 - Powerful Inference With the D-Statistic on Low-Coverage Whole-Genome Data

AU - Soraggi, Samuele

AU - Wiuf, Carsten

AU - Albrechtsen, Anders

PY - 2018

Y1 - 2018

N2 - The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data calling genotypes accurately is not always possible, therefore the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate.\\ We prove that the D-statistic is approximated by a standard normal. Furthermore we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low/medium sequencing depth (1-10X) and performances are as good as with perfectly called genotypes at a sequencing depth of 2X. We show the reliability of error correction on scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.

AB - The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high throughput sequencing data calling genotypes accurately is not always possible, therefore the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate.\\ We prove that the D-statistic is approximated by a standard normal. Furthermore we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low/medium sequencing depth (1-10X) and performances are as good as with perfectly called genotypes at a sequencing depth of 2X. We show the reliability of error correction on scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates.

U2 - 10.1534/g3.117.300192

DO - 10.1534/g3.117.300192

M3 - Journal article

C2 - 29196497

VL - 8

SP - 551

EP - 566

JO - G3: Genes, Genomes, Genetics (Bethesda)

JF - G3: Genes, Genomes, Genetics (Bethesda)

SN - 2160-1836

IS - 2

ER -

ID: 188091224

Institut for Matematiske Fag