🤖 AI Summary
This work addresses the gap between Shannon’s rate-distortion theory and practical performance at finite blocklengths, focusing on the Bernoulli source under Hamming distortion. Starting from first principles, it rigorously derives the rate-distortion function \( R(D) = H(p) - H(D) \). By integrating the Blahut–Arimoto algorithm with finite-blocklength asymptotic analysis, the paper constructs a tutorial-style theoretical framework that explicitly introduces the rate-distortion dispersion \( V(D) \) to characterize the \( O(1/\sqrt{n}) \) convergence rate to the asymptotic limit. Accompanying reproducible Python simulations validate the theoretical predictions, offering both precise quantification of finite-length compression limits and a practical toolkit for applied analysis.
📝 Abstract
Lossy data compression lies at the heart of modern communication and storage systems. Shannon's rate-distortion theory provides the fundamental limit on how much a source can be compressed at a given fidelity, but it assumes infinitely long block lengths that are never realized in practice. We present a self-contained tutorial on rate-distortion theory for the simplest non-trivial source: a Bernoulli$(p)$ sequence with Hamming distortion. We derive the classical rate-distortion function $RD = Hp - HD$ from first principles, illustrate its computation via the Blahut-Arimoto algorithm, and then develop the finite block length refinements that characterize how the minimum achievable rate approaches the Shannon limit as the block length $n$ grows. The central quantity in this refinement is the \emph{rate-distortion dispersion} $V(D)$, which governs the $O(1/\sqrt{n})$ penalty for operating at finite block lengths. We accompany all theoretical developments with numerical examples and figures generated by accompanying Python scripts.