🤖 AI Summary
This study addresses the NP-hard problem of maximizing phylogenetic diversity (PD) under a budget constraint in phylogenetic networks with non-uniform protection costs. The authors propose a novel parameterized algorithmic framework based on node scanwidth (nsw), which enables exact PD optimization under heterogeneous costs for the first time. They also introduce the first exact algorithm to compute nsw and its associated decomposition. By integrating dynamic programming, data reduction, and integer linear programming, the approach efficiently solves PD maximization on networks exhibiting strong tree-like structure. Experimental results demonstrate that both optimal PD scores and nsw values can be computed within seconds on highly reticulate simulated networks containing hundreds of taxa, substantially outperforming existing baseline methods that are limited to unit costs.
📝 Abstract
Identifying a subset of taxa that maximizes Phylogenetic Diversity (PD) is a cornerstone of quantitative conservation planning. Traditionally, PD is defined over a phylogenetic tree in which leaves resemble present-day taxa and the branch lengths capture the estimated evolutionary distinctiveness. While PD maximization is computationally tractable on trees with unit costs, the problem becomes NP-hard when transitioning to phylogenetic networks or to budgeted versions in which protecting taxa incurs non-homogeneous costs. In this paper, we address these two challenges by providing definitions and a comprehensive analysis of three distinct variants of budgeted PD on networks. We conduct our study through the lens of a small structural parameter, node scanwidth (nsw), which measures the "tree-likeness" of a phylogenetic network. We show that two of the considered variants can be optimized in O*(2^nsw B^2) time, where B is the budget. For the computationally harder, third variant, we provide an algorithm to compute PD scores in O*(3^nsw) time. We further contribute the first exact algorithms to compute node scanwidth, recognizing that the utility of algorithms based on nsw depends on the ability to compute nsw and its corresponding decomposition. Our approaches integrate data reduction rules, dynamic programming, and an Integer Linear Programming formulation. We validate our theoretical results through extensive experiments on highly reticulated, simulated networks containing several hundred taxa, using heterogeneous costs. Our implementation computes PD scores and optimal nsw in fractions of a second, even on the most challenging instances. Furthermore, our budgeted optimization algorithms significantly outperform existing benchmarks for computing PD on networks, which were previously limited to unit-cost scenarios. The software makes analyses even on networks with a thousand taxa tracta...