🤖 AI Summary
This study addresses the limitation of traditional approaches that assume a log-linear relationship between environmental covariates and effective population size (Ne(t)), which often fails to capture the true nonlinear drivers and leads to biased inference. To overcome this, we propose a novel Bayesian framework that, for the first time, incorporates a Gaussian process prior within a coalescent model to flexibly and nonparametrically model the nonlinear influence of covariates on Ne(t) without pre-specifying a functional form. Temporal smoothness of the inferred trajectories is ensured through a Gaussian Markov random field prior. Evaluated on simulations and empirical datasets—including yellow fever virus, muskox, and HIV-1—the method accurately distinguishes linear from nonlinear effects and substantially outperforms conventional linear models, thereby enhancing both the biological realism of demographic inference and the quantification of associated uncertainty.
📝 Abstract
Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.