PhD Defense Samuele Soraggi

Title: Theory and inference on gene flow and ploidy levels from NGS data


The overall focus of this thesis is the theoretical study, the statistical analysis and implementation of models targeted to genetics data produced with Next Generation Sequencing (NGS) techniques. With the advent of NGS technologies, scientists have been provided with large amount of DNA data at unprecedentedly low cost and speed.

However, this type of data has problematic issues related to frequent errors and difficulties in inferring some genetic information.

Here, a statistical method called four-population test is analyzed and improved to carry out analysis on NGS data. With this test, it is possible to verify if a set of four populations satisfies a certain genetic relationship.

More complex genetic relationships - illustrated by graphs called “admixture graphs” - have become of interest thanks to the amount of DNA data available. A mathematical background based on moment statistics is proposed and related to the applications in the field of population genetics.

Using genotype likelihoods, a method for inferring the ploidy numbers of an organism from NGS data is proposed. Ploidy numbers play an important role in the speciation of organisms such as plants and fungi.


Supervisor: Carsten Wiuf, MATH, University of Copenhagen

Co- supervisor: Ass. Prof. Anders Albrechtsen, Bioinformatik og RNA Biologi


Assessment Committee

Ass. Prof. Hans Siegismund (Chair), Department of Biology, University of Copenhagen

Prof. Jeff Wall, UC at San Francisco

Ass. Prof. Thomas Mailund, Aarhus University