r*-indexing

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Efficient compression and fast pattern matching on highly repetitive texts remain challenging due to space–time trade-offs in existing full-text indexes. Method: This paper proposes a novel compressed full-text index leveraging two complementary compression measures: *r**—the sum of run lengths in both forward and reverse Burrows–Wheeler Transforms (BWT)—and *z*, the number of phrases in the LZ77 parsing. It integrates BWT, LZ77 grammar compression, suffix arrays, and divide-and-conquer query processing into a unified framework. Contribution/Results: The index achieves *O*(*r** log(*n*/*r**) + *z* log *n*) bits of space—the first to exploit bidirectional BWT run-length structure, overcoming the space bottleneck of conventional *r*-based indexes. It supports substring search in *O*(*m* log *n* + occ log^ε *n*) time, and leftmost/rightmost occurrence reporting in *O*(*m* log^ε *n*) time, significantly improving both space efficiency and query speed on repetitive texts.

Technology Category

Application Category

📝 Abstract

Let $T [1..n]$ be a text over an alphabet of size $σin mathrm{polylog} (n)$, let $r^*$ be the sum of the numbers of runs in the Burrows-Wheeler Transforms of $T$ and its reverse, and let $z$ be the number of phrases in the LZ77 parse of $T$. We show how to store $T$ in $O (r^* log (n / r^*) + z log n)$ bits such that, given a pattern $P [1..m]$, we can report the locations of the $mathrm{occ}$ occurrences of $P$ in $T$ in $O (m log n + mathrm{occ} log^εn)$ time. We can also report the position of the leftmost and rightmost occurrences of $P$ in $T$ in the same space and $O (m log^εn)$ time.

Problem

Research questions and friction points this paper is trying to address.

Efficiently store text T with minimal space using r*-indexing.

Quickly locate all occurrences of pattern P in T.

Identify leftmost and rightmost pattern P positions in T.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses r*-indexing for text compression

Combines BWT runs and LZ77 phrases

Efficient pattern search in compact space

🔎 Similar Papers

No similar papers found.

Authors to Follow