🤖 AI Summary
Existing string repetitiveness measures lack metrics that simultaneously capture structural regularity and computational tractability.
Method: We introduce the *suffixient set*—a novel structural representation of repetitiveness—and define its minimum cardinality χ as a new repetitiveness measure. Leveraging combinatorial string analysis and compressed data structure theory, we rigorously establish that χ lies between the LZ77 parsing size and the smallest grammar size.
Contribution/Results: χ bridges a theoretical gap by providing the first structured repetitiveness measure grounded in random-access capability. It is computable in polynomial time, admits precise theoretical placement within the hierarchy of compression-based measures, and exhibits fine-grained sensitivity to edit operations (e.g., insertions and deletions). This work establishes a new theoretical foundation and practical metric for indexing repetitive texts, pattern matching, and compression-aware algorithm design.
📝 Abstract
Suffixient sets are a novel combinatorial object that capture the essential information of repetitive strings in a way that, provided with a random-access mechanism, supports various forms of pattern matching. In this paper we study the size $chi$ of the smallest suffixient set as a repetitiveness measure: we place it between known measures and study its sensitivity to various string operations.