🤖 AI Summary
This work investigates the communication complexity of pattern matching with edit operations under a one-way communication model: given a text, a pattern, and an error threshold \(k\), the goal is to recover all matching positions whose edit distance to the pattern is at most \(k\), along with their corresponding edit sequences, using minimal communication. By integrating information-theoretic arguments with string algorithmic techniques, the authors devise a more compact encoding scheme that achieves a communication complexity of \(O\!\left(\frac{n}{m} \cdot k \cdot \log\frac{m|\Sigma|}{k}\right)\), nearly matching the theoretical lower bound for general alphabets. Notably, for constant-sized alphabets, this bound matches the known lower bound \(\Omega\!\left(\frac{n}{m} \cdot k \cdot \log\frac{m}{k}\right)\). Additionally, the paper establishes a new tight lower bound for the variant requiring explicit reporting of edit sequences, aligning precisely with the communication complexity of streaming mismatch pattern matching.
📝 Abstract
In the decades-old Pattern Matching with Edits problem, given a length-$n$ string $T$ (the text), a length-$m$ string $P$ (the pattern), and a positive integer $k$ (the threshold), the task is to list the $k$-error occurrences of $P$ in $T$, that is, all fragments of $T$ whose edit distance to $P$ is at most $k$. The one-way communication complexity of Pattern Matching with Edits is the minimum number of bits that Alice, given an instance $(P, T, k)$ of the problem, must send to Bob so that Bob can reconstruct the answer solely from that message.
For the natural parameter regime of $0 < k < m < n/2$, our recent work [STOC'24] yields that $Ω(n/m \cdot k \log(m/k))$ bits are necessary and $O(n/m \cdot k \log^2 m)$ bits are sufficient for Pattern Matching with Edits. More generally, for strings over an alphabet $Σ$, our recent work [STOC'24] gives an $O(n/m \cdot k \log m \log(m|Σ|))$-bit encoding that allows one to recover a shortest sequence of edits for every $k$-error occurrence of $P$ in $T$.
In this work, we revisit the original proof and improve the encoding size to $O(n/m \cdot k \log(m|Σ|/k))$, which matches the lower bound for constant-sized alphabets. We further establish a new tight lower bound of $Ω(n/m \cdot k \log(m|Σ|/k))$ for the edit sequence reporting variant that we solve. Our encoding size also matches the communication complexity established for the simpler Pattern Matching with Mismatches problem in the context of streaming algorithms [Clifford, Kociumaka, Porat; SODA'19].