The LZ78 Source

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates the information-theoretic properties of nonstationary, non-Markov sequence probability sources induced by the LZ78 universal compressor. Method: Leveraging information-theoretic analysis, ergodic theory, and convergence results for empirical distributions, we establish foundational asymptotic properties of these sources. Contribution/Results: We prove, for the first time, that such sources satisfy a Shannon–McMillan–Breiman-type theorem and that their empirical distributions converge almost surely to a deterministic i.i.d. law—exhibiting “quasi-stationary ergodicity” and “local near-i.i.d. behavior.” Crucially, we demonstrate that the finite-state compressibility rate strictly exceeds the entropy rate—a nonzero Jensen gap—thereby violating the classical equality that holds for stationary ergodic sources. Numerical simulations corroborate the theoretical predictions. These results provide a novel benchmark for evaluating the generalization capability of probabilistic models on nonstationary, non-Markov data.

Technology Category

Application Category

📝 Abstract
We study a family of processes generated according to sequential probability assignments induced by the LZ78 universal compressor. We characterize entropic and distributional properties such as their entropy and relative entropy rates, finite-state compressibility and log loss of their realizations, and the empirical distributions that they induce. Though not quite stationary, these sources are"almost stationary and ergodic;"similar to stationary and ergodic processes, they satisfy a Shannon-McMillan-Breiman-type property: the normalized log probability of their realizations converges almost surely to their entropy rate. Further, they are locally"almost i.i.d."in the sense that the finite-dimensional empirical distributions of their realizations converge almost surely to a deterministic i.i.d. law. However, unlike stationary ergodic sources, the finite-state compressibility of their realizations is almost surely strictly larger than their entropy rate by a"Jensen gap."We present simulations demonstrating the theoretical results. Among their potential uses, these sources allow to gauge the performance of sequential probability models on non-Markovian non-stationary data.
Problem

Research questions and friction points this paper is trying to address.

Characterize entropic properties of LZ78-generated processes.
Analyze finite-state compressibility and log loss of realizations.
Evaluate sequential probability models on non-stationary data.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LZ78 universal compressor for sequential probability assignments
Characterization of entropy and distributional properties
Simulations to demonstrate theoretical results
🔎 Similar Papers
No similar papers found.