🤖 AI Summary
Existing branch-specific substitution models for detecting selection pressure or mutation rate shifts along phylogenetic trees rely on prior knowledge of change-point locations and scale poorly to large datasets.
Method: We propose a prior-free, automated change-point detection framework that integrates branch-specific evolutionary models with shrinkage priors, enabling scalable inference in high-dimensional parameter spaces. Computational efficiency is substantially improved via analytical gradient computation and Hamiltonian Monte Carlo sampling.
Contribution/Results: Applied to BRCA1 gene and monkeypox virus data, our method successfully identifies dynamic shifts in selection pressure. It accelerates maximum-likelihood optimization by 90× and Bayesian inference by 360× compared to standard approaches. To our knowledge, this is the first end-to-end, fully automated method for detecting mutation pattern change-points on phylogenies—achieving both statistical rigor and computational scalability.
📝 Abstract
Branch-specific substitution models are popular for detecting evolutionary change-points, such as shifts in selective pressure. However, applying such models typically requires prior knowledge of change-point locations on the phylogeny or faces scalability issues with large data sets. To address both limitations, we integrate branch-specific substitution models with shrinkage priors to automatically identify change-points without prior knowledge, while simultaneously estimating distinct substitution parameters for each branch. To enable tractable inference under this high-dimensional model, we develop an analytical gradient algorithm for the branch-specific substitution parameters where the computation time is linear in the number of parameters. We apply this gradient algorithm to infer selection pressure dynamics in the evolution of the BRCA1 gene in primates and mutational dynamics in viral sequences from the recent mpox epidemic. Our novel algorithm enhances inference efficiency, achieving up to a 90-fold speedup per iteration in maximum-likelihood optimization when compared to central difference numerical gradient method and up to a 360-fold improvement in computational performance within a Bayesian framework using Hamiltonian Monte Carlo sampler compared to conventional univariate random walk sampler.