🤖 AI Summary
This study investigates the sensitivity of the LZ77 compression algorithm to at most $k$ string edit operations. By combining combinatorial analysis and approximation techniques with structural properties of LZ77 parses, the work identifies three distinct sensitivity behaviors and establishes a tight upper bound on the compressed size after edits: the post-edit compressed length increases by at most a factor of three plus an additive term of $4k$. Furthermore, the paper proposes a pre-editing strategy that, when accounting for the storage overhead of the edits themselves, reduces the total compressed size to as little as one-third of the original. This approach effectively mitigates catastrophic compression degradation—akin to the “one-bit catastrophe” observed in LZ78—thereby significantly enhancing overall compression efficiency.
📝 Abstract
We study the sensitivity of the Lempel-Ziv 77 compression algorithm to edits, showing how modifying a string $w$ can deteriorate or improve its compression. Our first result is a tight upper bound for $k$ edits: $\forall w' \in B(w,k)$, we have $C_{\mathrm{LZ77}}(w') \leq 3 \cdot C_{\mathrm{LZ77}}(w) + 4k$. This result contrasts with Lempel-Ziv 78, where a single edit can significantly deteriorate compressibility, a phenomenon known as a *one-bit catastrophe*.
We further refine this bound, focusing on the coefficient $3$ in front of $C_{\mathrm{LZ77}}(w)$, and establish a surprising trichotomy based on the compressibility of $w$. More precisely we prove the following bounds:
- if $C_{\mathrm{LZ77}}(w) \lesssim k^{3/2}\sqrt{n}$, the compression may increase by up to a factor of $\approx 3$,
- if $k^{3/2}\sqrt{n} \lesssim C_{\mathrm{LZ77}}(w) \lesssim k^{1/3}n^{2/3}$, this factor is at most $\approx 2$,
- if $C_{\mathrm{LZ77}}(w) \gtrsim k^{1/3}n^{2/3}$, the factor is at most $\approx 1$.
Finally, we present an $\varepsilon$-approximation algorithm to pre-edit a word $w$ with a budget of $k$ modifications to improve its compression. In favorable scenarios, this approach yields a total compressed size reduction by up to a factor of~$3$, accounting for both the LZ77 compression of the modified word and the cost of storing the edits, $C_{\mathrm{LZ77}}(w') + k \log |w|$.