Analyzing and Leveraging the $k$-Sensitivity of LZ77

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study investigates the sensitivity of the LZ77 compression algorithm to at most $k$ string edit operations. By combining combinatorial analysis and approximation techniques with structural properties of LZ77 parses, the work identifies three distinct sensitivity behaviors and establishes a tight upper bound on the compressed size after edits: the post-edit compressed length increases by at most a factor of three plus an additive term of $4k$. Furthermore, the paper proposes a pre-editing strategy that, when accounting for the storage overhead of the edits themselves, reduces the total compressed size to as little as one-third of the original. This approach effectively mitigates catastrophic compression degradation—akin to the “one-bit catastrophe” observed in LZ78—thereby significantly enhancing overall compression efficiency.

Technology Category

Application Category

📝 Abstract

We study the sensitivity of the Lempel-Ziv 77 compression algorithm to edits, showing how modifying a string $w$ can deteriorate or improve its compression. Our first result is a tight upper bound for $k$ edits: $\forall w' \in B(w,k)$, we have $C_{\mathrm{LZ77}}(w') \leq 3 \cdot C_{\mathrm{LZ77}}(w) + 4k$. This result contrasts with Lempel-Ziv 78, where a single edit can significantly deteriorate compressibility, a phenomenon known as a *one-bit catastrophe*. We further refine this bound, focusing on the coefficient $3$ in front of $C_{\mathrm{LZ77}}(w)$, and establish a surprising trichotomy based on the compressibility of $w$. More precisely we prove the following bounds: - if $C_{\mathrm{LZ77}}(w) \lesssim k^{3/2}\sqrt{n}$, the compression may increase by up to a factor of $\approx 3$, - if $k^{3/2}\sqrt{n} \lesssim C_{\mathrm{LZ77}}(w) \lesssim k^{1/3}n^{2/3}$, this factor is at most $\approx 2$, - if $C_{\mathrm{LZ77}}(w) \gtrsim k^{1/3}n^{2/3}$, the factor is at most $\approx 1$. Finally, we present an $\varepsilon$-approximation algorithm to pre-edit a word $w$ with a budget of $k$ modifications to improve its compression. In favorable scenarios, this approach yields a total compressed size reduction by up to a factor of~$3$, accounting for both the LZ77 compression of the modified word and the cost of storing the edits, $C_{\mathrm{LZ77}}(w') + k \log |w|$.

Problem

Research questions and friction points this paper is trying to address.

LZ77

compression sensitivity

string editing

compressibility

k-edits

Innovation

Methods, ideas, or system contributions that make the work stand out.

LZ77 sensitivity

edit distance

compression trichotomy