🤖 AI Summary
In Bayesian phylogenetic inference, the tree topology space is exponentially large and posterior distributions are often diffuse, rendering conventional frequentist approaches inadequate for constructing reliable credibility sets. To address this, we propose a novel α-credibility set framework based on Conditional Clade Distributions (CCDs): (i) we formally define the α-credible CCD, enabling interpretable quantification of support for individual trees and subtrees; (ii) we integrate the Highest Posterior Density (HPD) principle with simulated annealing sampling to overcome failure modes of standard methods under sparse posteriors; (iii) we introduce an ECDF-based visualization and rank-uniformity validation framework to ensure statistical robustness. Extensive experiments on both simulated and empirical datasets confirm accurate coverage probabilities and strong robustness. Our algorithm efficiently enables large-scale ranking and diagnostic assessment of tree-topology credibility, substantially enhancing reliability evaluation in Bayesian phylogenetics.
📝 Abstract
Credible intervals and credible sets, such as highest posterior density (HPD) intervals, form an integral statistical tool in Bayesian phylogenetics, both for phylogenetic analyses and for development. Readily available for continuous parameters such as base frequencies and clock rates, the vast and complex space of tree topologies poses significant challenges for defining analogous credible sets. Traditional frequency-based approaches are inadequate for diffuse posteriors where sampled trees are often unique. To address this, we introduce novel and efficient methods for estimating the credible level of individual tree topologies using tractable tree distributions, specifically Conditional Clade Distributions (CCDs). Furthermore, we propose a new concept called $alpha$ credible CCD, which encapsulates a CCD whose trees collectively make up $alpha$ probability. We present algorithms to compute these credible CCDs efficiently and to determine credible levels of tree topologies as well as of subtrees. We evaluate the accuracy of these credible set methods leveraging simulated and real datasets. Furthermore, to demonstrate the utility of our methods, we use well-calibrated simulation studies to evaluate the performance of different CCD models. In particular, we show how the credible set methods can be used to conduct rank-uniformity validation and produce Empirical Cumulative Distribution Function (ECDF) plots, supplementing standard coverage analyses for continuous parameters.