Fast Computation of $k$-Runs, Parameterized Squares, and Other Generalised Squares

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This paper addresses the efficient counting and identification of repetitive structures—such as k-runs and k-repetitions—and parameterized squares in strings under at most k mismatches. We establish the first theoretical upper bound of *O(nk log k)* on the output size of k-runs, and leverage it to design a unified framework that enumerates parameterized squares in *O(nσ log σ)* time. The framework is further generalized to enumerate inequivalent squares under various substring equivalence relations (e.g., parameterized, order-preserving, Cartesian), achieving *O(n log n)* time for reporting all non-equivalent squares. Key contributions include: (i) a tight upper bound on the number of k-runs; (ii) a generic enumeration paradigm compatible with multiple equivalence relations; and (iii) an integrated algorithmic approach combining k-mismatch matching, parameterized matching, and deduplication techniques—significantly improving computational efficiency for generalized repetitive structures.

Technology Category

Application Category

📝 Abstract

A $k$-mismatch square is a string of the form $XY$ where $X$ and $Y$ are two equal-length strings that have at most $k$ mismatches. Kolpakov and Kucherov [Theor. Comput. Sci., 2003] defined two notions of $k$-mismatch repeats, called $k$-repetitions and $k$-runs, each representing a sequence of consecutive $k$-mismatch squares of equal length. They proposed algorithms for computing $k$-repetitions and $k$-runs working in $O(nk log k + output)$ time for a string of length $n$ over an integer alphabet, where $output$ is the number of the reported repeats. We show that $output=O(nk log k)$, both in case of $k$-repetitions and $k$-runs, which implies that the complexity of their algorithms is actually $O(nk log k)$. We apply this result to computing parameterized squares. A parameterized square is a string of the form $XY$ such that $X$ and $Y$ parameterized-match, i.e., there exists a bijection $f$ on the alphabet such that $f(X) = Y$. Two parameterized squares $XY$ and $X'Y'$ are equivalent if they parameterized match. Recently Hamai et al. [SPIRE 2024] showed that a string of length $n$ over an alphabet of size $σ$ contains less than $nσ$ non-equivalent parameterized squares, improving an earlier bound by Kociumaka et al. [Theor. Comput. Sci., 2016]. We apply our bound for $k$-mismatch repeats to propose an algorithm that reports all non-equivalent parameterized squares in $O(nσlog σ)$ time. We also show that the number of non-equivalent parameterized squares can be computed in $O(n log n)$ time. This last algorithm applies to squares under any substring compatible equivalence relation and also to counting squares that are distinct as strings. In particular, this improves upon the $O(nσ)$-time algorithm of Gawrychowski et al. [CPM 2023] for counting order-preserving squares that are distinct as strings if $σ= ω(log n)$.

Problem

Research questions and friction points this paper is trying to address.

Computing k-mismatch repeats efficiently

Reporting non-equivalent parameterized squares

Counting distinct squares under equivalence relations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Applied O(nk log k) bound for k-mismatch repeats

Proposed O(nσ log σ) parameterized squares algorithm

Developed O(n log n) counting method for equivalence

🔎 Similar Papers

No similar papers found.