Application of the Discriminant Analysis of Principal Components (DAPC) Method in Genetic Data Analysis
Specialeforsvar: Doaa El-Chamma
Titel: Application of the Discriminant Analysis of Principal Components (DAPC) Method in Genetic Data Analysis
Abstract: Understanding the genetic structure of populations is crucial for unraveling the evolutionary history and demographic dynamics of diverse organisms. This study introduces the multivariate method, Discriminant Analysis of Principal Components (DAPC), which facilitates statistical inference regarding genetic structure, identification of population clusters, and assessment of individual admixture or ancestry proportions in genetic datasets. DAPC was applied to three datasets: Human microsatellites, Seasonal influenza (H3N2) hemagglutinin, and a genome sequencing dataset, which illustrated the method and address challenges and limitations encountered. Additionally, we conducted a simulation study to examine the effects of sample size, migration rate, and mutation rate on the DAPC analysis. Subsequently, we investigated how these parameters influence the resulting data size. The analysis of the empirical datasets highlights the challenges in determining the optimal number of principal components (PCs). It underscores the necessity of considering both statistical and biological factors when using DAPC with sequential K-means and model selection to infer genetic clusters. The simulation results include that the Identical sample sizes across populations were most effective in accurately reflecting the true population structure, which comprised four distinct
populations. Lower migration rates facilitated more accurate clustering, maintaining clear separation between populations. Mutation rates showed consistent behavior across different scenarios, indicating a less pronounced but still significant impact on
clustering outcomes. At the same time, different sample size populations achieved nearly equal data sizes, while low migration rates and high mutation rates achieved high data sizes.
Vejleder: Carsten Wiuf
Censor: Birgit Debrabant, SDU