Incongruity-sensitive access to highly compressed strings

📅 2026-02-04

📈 Citations: 1

✨ Influential: 1

career value

235K/year

🤖 AI Summary

This work addresses the longstanding trade-off between high compression ratios and efficient random access in highly compressed strings by introducing a novel inconsistency-sensitive access mechanism. For the first time, it links a character’s local incompressibility—quantified by its contextual irregularity—to query speed: the less compressible a character, the faster it can be accessed. Building upon run-length straight-line programs (RLSLPs) and block trees, combined with a non-overlapping phrase parsing strategy, the proposed data structure achieves space complexity of either O(g_rl) or O(L). Its access time is inversely proportional to the logarithm of the length of the longest repeated substring containing the target character, thereby breaking the conventional sub-logarithmic access lower bound and significantly enhancing adaptive query efficiency without compromising compression ratio.

Technology Category

Application Category

📝 Abstract

Random access to highly compressed strings -- represented by straight-line programs or Lempel-Ziv parses, for example -- is a well-studied topic. Random access to such strings in strongly sublogarithmic time is impossible in the worst case, but previous authors have shown how to support faster access to specific characters and their neighbourhoods. In this paper we explore whether, since better compression can impede access, we can support faster access to relatively incompressible substrings of highly compressed strings. We first show how, given a run-length compressed straight-line program (RLSLP) of size $g_{rl}$ or a block tree of size $L$, we can build an $O (g_{rl})$-space or an $O (L)$-space data structure, respectively, that supports access to any character in time logarithmic in the length of the longest repeated substring containing that character. That is, the more incongruous a character is with respect to the characters around it in a certain sense, the faster we can support access to it. We then prove a similar but more powerful and sophisticated result for parsings in which phrases'sources do not overlap much larger phrases, with the query time depending also on the number of phrases we must copy from their sources to obtain the queried character.

Problem

Research questions and friction points this paper is trying to address.

incongruity-sensitive access

highly compressed strings

random access

straight-line programs

Lempel-Ziv parses

Innovation

Methods, ideas, or system contributions that make the work stand out.

incongruity-sensitive access

highly compressed strings

straight-line programs