String Partition for Building Long Burrows-Wheeler Transforms

📅 2024-06-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Constructing the Burrows–Wheeler Transform (BWT) for long strings incurs substantial time and space overheads, limiting scalability. To address this, we propose an adaptive string partitioning strategy guided by suffix array prefixes, dynamically dividing long sequences into shorter substrings and leveraging a parallel multi-string BWT construction framework. This is the first work to utilize suffix array prefixes for partitioning guidance. Integrated with IBB-index optimization and partDNA-specific implementation, our method achieves memory consumption below 1.5× the input length and up to 3.2× faster construction on real genomic datasets—significantly outperforming state-of-the-art tools. The approach is general-purpose, supporting arbitrary character sets, and introduces a new paradigm for efficient, low-memory BWT construction of large-scale biological sequences.

Technology Category

Application Category

📝 Abstract
Constructing the Burrows-Wheeler transform (BWT) for long strings poses significant challenges regarding construction time and memory usage. We use a prefix of the suffix array to partition a long string into shorter substrings, thereby enabling the use of multi-string BWT construction algorithms to process these partitions fast. We provide an implementation, partDNA, for DNA sequences. Through comparison with state-of-the-art BWT construction algorithms, we show that partDNA with IBB offers a novel trade-off for construction time and memory usage for BWT construction on real genome datasets. Beyond this, the proposed partitioning strategy is applicable to strings of any alphabet.
Problem

Research questions and friction points this paper is trying to address.

Efficient BWT construction for long strings
Reduce memory usage in BWT algorithms
General partitioning strategy for any alphabet
Innovation

Methods, ideas, or system contributions that make the work stand out.

Partition long strings using suffix array prefixes
Utilize multi-string BWT algorithms for efficiency
Applicable to any alphabet strings universally
🔎 Similar Papers
No similar papers found.