Streaming periodicity with mismatches, wildcards, and edits

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses periodicity detection in noisy string streams, supporting realistic perturbations including mismatches, wildcards, and edit operations. For Hamming distance, we present the first single-pass streaming algorithm for period detection that imposes no restrictions on suffix characters and fully supports wildcards; for edit distance, we design the first two-pass streaming algorithm. Our approach integrates and significantly extends three key techniques: Clifford et al.’s Hamming sketch, Charalampopoulos’ structural analysis for k-mismatch periodicity, and Bhattacharya–Koucký’s grammar-based decomposition. Compared to prior work, our algorithms achieve improved time complexity, eliminate traditional constraints on wildcard positions, and demonstrate superior robustness and practicality on real-world noisy data.

Technology Category

Application Category

📝 Abstract
In this work, we study the problem of detecting periodic trends in strings. While detecting exact periodicity has been studied extensively, real-world data is often noisy, where small deviations or mismatches occur between repetitions. This work focuses on a generalized approach to period detection that efficiently handles noise. Given a string $S$ of length $n$, the task is to identify integers $p$ such that the prefix and the suffix of $S$, each of length $n-p+1$, are similar under a given distance measure. Ergün et al. [APPROX-RANDOM 2017] were the first to study this problem in the streaming model under the Hamming distance. In this work, we combine, in a non-trivial way, the Hamming distance sketch of Clifford et al. [SODA 2019] and the structural description of the $k$-mismatch occurrences of a pattern in a text by Charalampopoulos et al. [FOCS 2020] to present a more efficient streaming algorithm for period detection under the Hamming distance. As a corollary, we derive a streaming algorithm for detecting periods of strings which may contain wildcards, a special symbol that match any character of the alphabet. Our algorithm is not only more efficient than that of Ergün et al. [TCS 2020], but it also operates without their assumption that the string must be free of wildcards in its final characters. Additionally, we introduce the first two-pass streaming algorithm for computing periods under the edit distance by leveraging and extending the Bhattacharya-Koucký's grammar decomposition technique [STOC 2023].
Problem

Research questions and friction points this paper is trying to address.

Detecting periodic trends in noisy strings with mismatches
Handling wildcards in streaming periodicity detection efficiently
Introducing two-pass streaming algorithm for edit distance periods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining Hamming distance sketch with structural descriptions
Efficient streaming algorithm for period detection
Two-pass streaming algorithm for edit distance periods
🔎 Similar Papers
No similar papers found.
T
Taha El Ghazi
DIENS, École normale supérieure de Paris, PSL Research University, France
Tatiana Starikovskaya
Tatiana Starikovskaya
Ecole Normale Supérieure
Stringologyrandomized algorithmsapproximate algorithmsstreaming algorithmscommunication