Smallest Suffixient Sets as a Repetitiveness Measure

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing string repetitiveness measures lack metrics that simultaneously capture structural regularity and computational tractability. Method: We introduce the *suffixient set*—a novel structural representation of repetitiveness—and define its minimum cardinality χ as a new repetitiveness measure. Leveraging combinatorial string analysis and compressed data structure theory, we rigorously establish that χ lies between the LZ77 parsing size and the smallest grammar size. Contribution/Results: χ bridges a theoretical gap by providing the first structured repetitiveness measure grounded in random-access capability. It is computable in polynomial time, admits precise theoretical placement within the hierarchy of compression-based measures, and exhibits fine-grained sensitivity to edit operations (e.g., insertions and deletions). This work establishes a new theoretical foundation and practical metric for indexing repetitive texts, pattern matching, and compression-aware algorithm design.

Technology Category

Application Category

📝 Abstract

Suffixient sets are a novel combinatorial object that capture the essential information of repetitive strings in a way that, provided with a random-access mechanism, supports various forms of pattern matching. In this paper we study the size $chi$ of the smallest suffixient set as a repetitiveness measure: we place it between known measures and study its sensitivity to various string operations.

Problem

Research questions and friction points this paper is trying to address.

Study smallest suffixient sets as repetitiveness measure

Compare suffixient sets with known repetitiveness measures

Analyze sensitivity of suffixient sets to string operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel suffixient sets for repetitive strings

Smallest suffixient set as repetitiveness measure

Supports pattern matching with random-access

🔎 Similar Papers

Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores