Efficient Computation of Periods and Covers Using Sampling

📅 2024-07-25
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the efficient computation of string periods and shortest covers—key primitives in text compression, computational biology, and pattern recognition. We propose a lightweight encoding strategy based on Character-Distance Sampling (CDS): using the first character as a pivot to sample distances across the string. To our knowledge, this is the first dedicated application of CDS to period and cover analysis, achieving both theoretical elegance and substantial efficiency gains. Our approach eliminates redundant full-string scans inherent in conventional methods and integrates optimized period detection with streamlined cover verification. Experimental evaluation on standard benchmarks demonstrates speedups of 38%–43% for period computation and 63%–72% for shortest cover detection, compared to state-of-the-art baselines. The method is both algorithmically novel and practically deployable, offering a favorable balance between conceptual simplicity and engineering utility.

Technology Category

Application Category

📝 Abstract
Identifying regularities in strings, such as emph{periods} and emph{covers}, is crucial for applications in text compression, computational biology, and pattern recognition. emph{Characters-Distance-Sampling} ( exttt{CDS}) is an efficient technique that encodes a string by storing distances between selected pivot characters, accelerating string-processing tasks. We apply exttt{CDS} to compute periods and shortest covers, selecting only the first character as the pivot. This strategy yields optimized computations, achieving speedups of $38%$--$43%$ for period computation and $63%$--$72%$ for cover detection. These results demonstrate the potential of exttt{CDS}-based representations for efficient string analysis and broader applications.
Problem

Research questions and friction points this paper is trying to address.

Efficient computation of periods and covers in strings.
Application of Characters-Distance-Sampling (CDS) for string analysis.
Achieving significant speedups in period and cover detection tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Characters-Distance-Sampling (CDS) for string encoding
Selects first character as pivot for efficiency
Achieves significant speedups in period and cover computations
🔎 Similar Papers
No similar papers found.
Thierry Lecroq
Thierry Lecroq
LITIS UR 4108, University of Rouen Normandy, France
Computer ScienceBioinformaticsAlgorithms on strings
F
Francesco Pio Marino
Univ Rouen Normandie, INSA Rouen Normandie, Université Le Havre Normandie, Normandie Univ, LITIS UR 4108, CNRS NormaSTIC FR 3638, IRIB, Rouen F-76000, France; Università di Catania, Dipartimento di Matematica e Informatica, viale A.Doria n.6, 95125, Catania, Italia