PhylaFlow: Hybrid Flow Matching in Billera-Holmes-Vogtmann Tree Space for Phylogenetic Inference

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
This work addresses the challenge of efficient Bayesian inference in the Billera–Holmes–Vogtmann (BHV) space of phylogenetic trees, which combines continuous branch lengths with discrete topologies and exhibits substantial geometric complexity. The authors propose PhylaFlow, the first method to introduce hybrid flow matching into BHV space, enabling joint geometric-aware modeling of continuous branch-length evolution within orthants and discrete topological transitions across boundaries. By integrating geodesic-path training, sequence-embedding-conditioned modeling, and MCMC refinement, PhylaFlow substantially reduces initial Tree-KL divergence and improves early-to-mid-stage topological recovery on most datasets in the DS1–DS8 benchmarks—outperforming short-warmup baselines on seven tasks and surpassing PhyloGFN on five.
📝 Abstract
Phylogenetic trees are hybrid objects: branch lengths vary continuously, while topologies change discretely through edge contractions and expansions. Billera-Holmes-Vogtmann (BHV) tree space provides a canonical geometry for this structure, representing each resolved topology as a Euclidean orthant and topological changes as motion across shared lower-dimensional boundaries. We introduce PhylaFlow, a hybrid flow-matching model that learns posterior-basin transport in BHV tree space. PhylaFlow is trained on BHV geodesic paths from random starting trees to short-run posterior samples, coupling continuous branch-length motion within orthants with learned boundary events and discrete topology transitions. We evaluate the learned geometry operationally: if the flow reaches posterior-relevant regions, finite-budget Bayesian refinement initialized from, or guided by, its terminal trees should recover posterior-supported topologies more efficiently. Across DS1-DS8 phylogenetic posterior benchmarks, PhylaFlow substantially reduces initial Tree-KL relative to classical initializers. After finite-budget MrBayes refinement, direct PhylaFlow improves early and intermediate topology-recovery trajectories on most datasets, while split-guided PhylaFlow-MCMC obtains the strongest hard-case results. The best PhylaFlow variant outperforms short-warmup on seven of eight datasets and PhyloGFN on five of eight under the same refinement budget. In a joint sequence-conditioned experiment, sequence embeddings steer posterior split recovery, although exact posterior topology recovery remains preliminary. These results show that hybrid flow matching can learn actionable transport in BHV tree space and provide a geometry-aware proposal mechanism for Bayesian phylogenetic inference.
Problem

Research questions and friction points this paper is trying to address.

phylogenetic inference
BHV tree space
hybrid geometry
topology transition
branch lengths
Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid flow matching
BHV tree space
phylogenetic inference
topology transition
posterior transport