🤖 AI Summary
This work proposes a unified framework for efficiently identifying bubble-like subgraphs—specifically superbubbles, snarls, and ultrabubbles—associated with genomic variation in both directed and bidirected graphs. Leveraging SPQR tree decomposition, the method integrates dynamic programming traversal, feedback arc set computation, and 2-separator analysis to achieve, for the first time, linear-time identification of snarls and ultrabubbles, along with a linear-size representation of snarls. Notably, the study proves that the feedback arc set problem can be solved in linear time on endpoint-free bidirected graphs. By unifying the detection of all three subgraph types within a single linear-time algorithm, this approach substantially enhances the efficiency of genomic graph structural analysis.
📝 Abstract
A fundamental algorithmic problem in computational biology is to find all subgraphs of a given type (superbubbles, snarls, and ultrabubbles) in a directed or bidirected input graph. These correspond to regions of genetic variation and are useful in analyzing collections of genomes. We present the first linear-time algorithms for identifying all snarls and all ultrabubbles, resolving problems open since 2018. The algorithm for snarls relies on a new linear-size representation of all snarls with respect to the number of vertices in the graph. We employ the well-known SPQR-tree decomposition, which encodes all 2-separators of a biconnected graph. After several dynamic-programming-style traversals of this tree, we maintain key properties (such as acyclicity) that allow us to decide whether a given 2-separator defines a subgraph to be reported. A crucial ingredient for linear-time complexity is that acyclicity of linearly many subgraphs can be tested simultaneously via the problem of computing all arcs in a directed graph whose removal renders it acyclic (so-called feedback arcs). As such, we prove a fundamental result for bidirected graphs, that may be of independent interest: all feedback arcs can be computed in linear time for tipless bidirected graphs, while in general this is at least as hard as matrix multiplication, assuming the k-Clique Conjecture. Our results form a unified framework that also yields a completely different linear-time algorithm for finding all superbubbles. Although some of the results are technically involved, the underlying ideas are conceptually simple, and may extend to other bubble-like subgraphs. More broadly, our work contributes to the theoretical foundations of computational biology and advances a growing line of research that uses SPQR-tree decompositions as a general tool for designing efficient algorithms, beyond their traditional role in graph drawing.