Generalization of Repetitiveness Measures for Two-Dimensional Strings

📅 2025-05-15

🏛️ SPIRE

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the lack of established repetitiveness measures for two- and higher-dimensional strings. We first systematically establish a *d*-dimensional (d ≥ 2) repetitiveness theory framework, generalizing classical one-dimensional measures (δ, γ, g, gₙₗ, b) to higher dimensions and rigorously distinguishing two fundamental classes: those based on substring complexity versus those grounded in copy-paste mechanisms. We prove that, in two dimensions, these two classes are mutually incomparable—neither dominates the other—revealing their intrinsic incomparability. To enable efficient redundancy modeling, we propose a *d*-dimensional grammar representation supporting O(log N) random access, integrating attractors, block trees, and substring complexity analysis. Our theoretical framework and algorithmic constructions provide a unified foundation and practical implementation support for designing compressors for two-dimensional data.

Technology Category

Application Category

📝 Abstract

The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction of the notion of string attractor [Kempa and Prezza, STOC 2018] and by the results showing the relationship between attractors and other measures of compressibility. When the input data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy often offers an even richer source for compression. However, systematic studies on repetitiveness measures for two-dimensional strings are still scarce. In this paper we extend to two or more dimensions the main measures of complexity introduced for one-dimensional strings. We distinguish between the measures $delta$ and $gamma$, defined in terms of the substrings of the input, and the measures $g$, $g_{rl}$, and $b$, which are based on copy-paste mechanisms. We study the properties and mutual relationships between these two classes and we show that the two classes become incomparable for $d$-dimensional inputs as soon as $dgeq 2$. Moreover, we show that our grammar-based representation of a $d$-dimensional string of size $N$ enables direct access to any symbol in $O(log N)$ time. We also compare our measures for two-dimensional strings with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024] and provide some insights for the design of future effective two-dimensional compressors.

Problem

Research questions and friction points this paper is trying to address.

Extending 1D string repetitiveness measures to 2D strings

Comparing complexity measures δ, γ with copy-paste-based measures

Enabling O(log N) access in grammar-based d-dimensional representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends 1D string measures to 2D strings

Compares δ, γ with grammar-based measures

Enables O(log N) direct symbol access

🔎 Similar Papers

Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores