Explaining the Inherent Tradeoffs for Suffix Array Functionality: Equivalences between String Problems and Prefix Range Queries

📅 2025-10-22

📈 Citations: 2

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses the problem of efficiently accessing suffix arrays (SAs) when they cannot be stored explicitly. It establishes, for the first time, a *bidirectional equivalence*—in space, query time, and construction efficiency—between SA access and prefix selection, unifying the complexity analysis of fundamental string indexing operations. Through systematic reductions, the authors identify six pairs of intrinsically equivalent problems and prove that nearly all optimal SA representations can be realized via prefix selection structures. Leveraging this equivalence, they design a data structure supporting sublinear construction: for binary text, it achieves *O(n)* bits of space, *O(n/√log n)* preprocessing time, and *O(log^ε n)* query time—*matching and closing a long-standing complexity gap* in the field.

Technology Category

Application Category

📝 Abstract

We study the fundamental question of how efficiently suffix array entries can be accessed when the array cannot be stored explicitly. The suffix array $SA_T[1..n]$ of a text $T$ of length $n$ encodes the lexicographic order of its suffixes and underlies numerous applications in pattern matching, data compression, and bioinformatics. Previous work established one-way reductions showing how suffix array queries can be answered using, for example, rank queries on the Burrows-Wheeler Transform. More recently, a new class of prefix queries was introduced, together with reductions that, among others, transform a simple tradeoff for prefix-select queries into a suffix array tradeoff matching state-of-the-art space and query-time bounds, while achieving sublinear construction time. For binary texts, the resulting data structure achieves space $O(n)$ bits, preprocessing time $O(n / sqrt{log n})$, preprocessing space of $O(n)$ bits, and query time $O(log^{epsilon} n)$ for any constant $epsilon>0$. However, whether these bounds could be improved using different techniques has remained open. We resolve this question by presenting the first bidirectional reduction showing that suffix array queries are, up to an additive $O(loglog n)$ term in query time, equivalent to prefix-select queries in all parameters. This result unifies prior approaches and shows that essentially all efficient suffix array representations can be expressed via prefix-select structures. Moreover, we prove analogous equivalences for inverse suffix array queries, pattern ranking, lexicographic range, and SA-interval queries, identifying six core problem pairs that connect string and prefix query models. Our framework thus provides a unified foundation for analyzing and improving the efficiency of fundamental string-processing problems through the lens of prefix queries.

Problem

Research questions and friction points this paper is trying to address.

Studying efficient access to suffix array entries without explicit storage

Establishing equivalences between suffix array queries and prefix-select queries

Providing unified framework for analyzing string-processing problems via prefix queries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bidirectional reduction between suffix array and prefix-select queries

Unified framework for string problems via prefix queries

Equivalence proofs for six core string processing problems

🔎 Similar Papers

No similar papers found.