Merging RLBWTs adaptively

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the efficient merging of run-length compressed Burrows–Wheeler transforms (RLBWTs). We propose a new algorithm with space complexity (O(R)) and time complexity ( ilde{O}(L + sigma + R)), where (R) is the total number of runs in the merged RLBWT, (sigma) is the alphabet size, and (L) is the sum of boundary longest common prefix (LCP) values. Our key methodological innovation is the introduction of *boundary LCP*, enabling adaptive acceleration: for highly repetitive yet divergent string collections—such as pangenomes or multi-reference genomes—(L) is typically much smaller than conventional measures (e.g., total text length). The algorithm builds upon the extended BWT (eBWT) framework, integrating character-block boundary detection with boundary LCP analysis. Experiments demonstrate that our approach significantly outperforms state-of-the-art methods when (L) is small, providing both theoretical foundations and practical tools for constructing compact, scalable indexes over large-scale repetitive sequence data.

Technology Category

Application Category

📝 Abstract
We show how to merge run-length compressed Burrows-Wheeler Transforms (RLBWTs) quickly and in $O (R)$ space, where $R$ is the total number of runs in them, when a certain parameter is small. Specifically, we consider the boundaries in their combined extended Burrows-Wheeler Transform (eBWT) between blocks of characters from the same original RLBWT, and denote by $L$ the sum of the longest common prefix (LCP) values at those boundaries. We show how to merge the RLBWTs in $ ilde{O} (L + σ+ R)$ time, where $σ$ is the alphabet size. We conjecture that $L$ tends to be small when the strings (or sets of strings) underlying the original RLBWTs are repetitive but dissimilar.
Problem

Research questions and friction points this paper is trying to address.

Merging run-length compressed Burrows-Wheeler Transforms efficiently
Achieving O(R) space complexity during RLBWT merging
Developing algorithm with time complexity dependent on LCP values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive merging of run-length compressed BWT
Efficient O(R) space complexity algorithm
Optimized time complexity with small parameter L
🔎 Similar Papers
No similar papers found.