🤖 AI Summary
Conventional population genetic models—such as the Wright–Fisher framework—often assume locus independence and neglect linkage disequilibrium (LD) and parameter uncertainty, limiting their applicability to pooled sequencing (Pool-Seq) data under environmental selection. While deep generative models hold promise, their adoption in population genomics remains constrained by high data requirements, poor interpretability, and insufficient integration of local genomic context.
Method: We introduce the first deep generative neural network tailored for Evolve-and-Resequence (E&R) experiments, jointly modeling temporal SNP allele frequency trajectories and flanking genomic sequence information to explicitly capture LD dynamics.
Contribution/Results: On simulated E&R data, our approach significantly improves LD estimation accuracy—particularly in high-LD regions—overcoming the limitations of independent-site assumptions. It delivers a novel, interpretable, and high-precision paradigm for Pool-Seq analysis, enabling principled inference of selection signatures while accounting for realistic genomic architecture.
📝 Abstract
The investigation of allele frequency trajectories in populations evolving under controlled environmental pressures has become a popular approach to study evolutionary processes on the molecular level. Statistical models based on well-defined evolutionary concepts can be used to validate different hypotheses about empirical observations. Despite their popularity, classic statistical models like the Wright-Fisher model suffer from simplified assumptions such as the independence of selected loci along a chromosome and uncertainty about the parameters. Deep generative neural networks offer a powerful alternative known for the integration of multivariate dependencies and noise reduction. Due to their high data demands and challenging interpretability they have, so far, not been widely considered in the area of population genomics. To address the challenges in the area of Evolve and Resequencing experiments (E&R) based on pooled sequencing (Pool-Seq) data, we introduce a deep generative neural network that aims to model a concept of evolution based on empirical observations over time. The proposed model estimates the distribution of allele frequency trajectories by embedding the observations from single nucleotide polymorphisms (SNPs) with information from neighboring loci. Evaluation on simulated E&R experiments demonstrates the model's ability to capture the distribution of allele frequency trajectories and illustrates the representational power of deep generative models on the example of linkage disequilibrium (LD) estimation. Inspecting the internally learned representations enables estimating pairwise LD, which is typically inaccessible in Pool-Seq data. Our model provides competitive LD estimation in Pool-Seq data high degree of LD when compared to existing methods.