Credible Sets of Phylogenetic Tree Topology Distributions

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

In Bayesian phylogenetic inference, the tree topology space is exponentially large and posterior distributions are often diffuse, rendering conventional frequentist approaches inadequate for constructing reliable credibility sets. To address this, we propose a novel α-credibility set framework based on Conditional Clade Distributions (CCDs): (i) we formally define the α-credible CCD, enabling interpretable quantification of support for individual trees and subtrees; (ii) we integrate the Highest Posterior Density (HPD) principle with simulated annealing sampling to overcome failure modes of standard methods under sparse posteriors; (iii) we introduce an ECDF-based visualization and rank-uniformity validation framework to ensure statistical robustness. Extensive experiments on both simulated and empirical datasets confirm accurate coverage probabilities and strong robustness. Our algorithm efficiently enables large-scale ranking and diagnostic assessment of tree-topology credibility, substantially enhancing reliability evaluation in Bayesian phylogenetics.

Technology Category

Application Category

📝 Abstract

Credible intervals and credible sets, such as highest posterior density (HPD) intervals, form an integral statistical tool in Bayesian phylogenetics, both for phylogenetic analyses and for development. Readily available for continuous parameters such as base frequencies and clock rates, the vast and complex space of tree topologies poses significant challenges for defining analogous credible sets. Traditional frequency-based approaches are inadequate for diffuse posteriors where sampled trees are often unique. To address this, we introduce novel and efficient methods for estimating the credible level of individual tree topologies using tractable tree distributions, specifically Conditional Clade Distributions (CCDs). Furthermore, we propose a new concept called $alpha$ credible CCD, which encapsulates a CCD whose trees collectively make up $alpha$ probability. We present algorithms to compute these credible CCDs efficiently and to determine credible levels of tree topologies as well as of subtrees. We evaluate the accuracy of these credible set methods leveraging simulated and real datasets. Furthermore, to demonstrate the utility of our methods, we use well-calibrated simulation studies to evaluate the performance of different CCD models. In particular, we show how the credible set methods can be used to conduct rank-uniformity validation and produce Empirical Cumulative Distribution Function (ECDF) plots, supplementing standard coverage analyses for continuous parameters.

Problem

Research questions and friction points this paper is trying to address.

Defining credible sets for complex tree topology distributions

Estimating credible levels using Conditional Clade Distributions (CCDs)

Validating rank-uniformity and coverage in phylogenetic analyses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Conditional Clade Distributions (CCDs)

Introduces α credible CCD concept

Develops algorithms for credible CCDs

🔎 Similar Papers

Orientability of undirected phylogenetic networks to a desired class: Practical algorithms and application to tree-child orientation