Polynomial-time completion of phylogenetic tree sets

πŸ“… 2026-04-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

217K/year
πŸ€– AI Summary
This study addresses the challenge of comparing phylogenetic trees with partially overlapping taxa by introducing the first polynomial-time global completion algorithm. The method identifies frequently occurring maximal completable subtrees from source trees, constructs a weighted majority-rule consensus tree, and derives rate-scaling factors based on shared leaf nodes to calibrate branch lengths. The consensus subtree is then inserted into the target tree at the position that minimizes squared distance error. This work achieves, for the first time, an efficient completion that is unique, order-independent, and exactly preserves original pairwise distance relationships. Evaluated on empirical datasets spanning amphibians, mammals, sharks, and squamates, the approach yields significantly improved topological accuracy and branch length fidelity compared to existing methods.

Technology Category

Application Category

πŸ“ Abstract
Comparative analyses of phylogenetic trees typically require identical taxon sets, however, in practice, trees often include distinct but overlapping taxa. Pruning non-shared leaves discards phylogenetic signal, whereas tree completion can preserve both taxa and branch-length information. This work introduces a polynomial-time algorithm for set-wide completion of phylogenetic trees with partial taxon overlap. The proposed method identifies and extracts maximal completion subtrees that frequently appear across the source trees and constructs a weighted majority-rule consensus. Branch lengths are scaled using rates derived from common leaves. Each consensus subtree is inserted at the position that minimizes the quadratic distance error measured against information from the source trees, with candidate positions restricted to the original branches of the target tree. We demonstrate that the algorithm runs in polynomial time and preserves distances among the original taxa, yielding a unique completion that is order-independent with respect to the processing order of target trees. An experimental evaluation on amphibians, mammals, sharks, and squamates shows that the proposed method consistently achieves the lowest distance to the subset reference trees across subsets among all methods, in both topology and branch lengths. An open-source Python implementation of the proposed algorithm and the biological datasets utilized in this study are publicly available at: https://github.com/tahiri-lab/overlap-treeset-completion/.
Problem

Research questions and friction points this paper is trying to address.

phylogenetic tree
taxon overlap
tree completion
polynomial-time
comparative analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

polynomial-time algorithm
phylogenetic tree completion
maximal completion subtrees
branch-length scaling
quadratic distance minimization
πŸ”Ž Similar Papers