🤖 AI Summary
This work addresses the lack of established repetitiveness measures for two- and higher-dimensional strings. We first systematically establish a *d*-dimensional (d ≥ 2) repetitiveness theory framework, generalizing classical one-dimensional measures (δ, γ, g, gₙₗ, b) to higher dimensions and rigorously distinguishing two fundamental classes: those based on substring complexity versus those grounded in copy-paste mechanisms. We prove that, in two dimensions, these two classes are mutually incomparable—neither dominates the other—revealing their intrinsic incomparability. To enable efficient redundancy modeling, we propose a *d*-dimensional grammar representation supporting O(log N) random access, integrating attractors, block trees, and substring complexity analysis. Our theoretical framework and algorithmic constructions provide a unified foundation and practical implementation support for designing compressors for two-dimensional data.
📝 Abstract
The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction of the notion of string attractor [Kempa and Prezza, STOC 2018] and by the results showing the relationship between attractors and other measures of compressibility. When the input data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy often offers an even richer source for compression. However, systematic studies on repetitiveness measures for two-dimensional strings are still scarce. In this paper we extend to two or more dimensions the main measures of complexity introduced for one-dimensional strings. We distinguish between the measures $delta$ and $gamma$, defined in terms of the substrings of the input, and the measures $g$, $g_{rl}$, and $b$, which are based on copy-paste mechanisms. We study the properties and mutual relationships between these two classes and we show that the two classes become incomparable for $d$-dimensional inputs as soon as $dgeq 2$. Moreover, we show that our grammar-based representation of a $d$-dimensional string of size $N$ enables direct access to any symbol in $O(log N)$ time. We also compare our measures for two-dimensional strings with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024] and provide some insights for the design of future effective two-dimensional compressors.