🤖 AI Summary
This study investigates the efficient compression of sequences of natural numbers and the statistical–physical nature of their code lengths. By introducing a coding scheme based on the zeta distribution, the problem is mapped to a Bose gas or Hagedorn system with energy levels given by the logarithms of prime numbers. The analysis combines microcanonical entropy and large deviation theory, revealing for the first time a connection between Hagedorn phase transitions and integer encoding, and clarifying a partial equivalence between microcanonical and canonical ensembles. The authors rigorously prove that the microcanonical entropy grows asymptotically linearly, derive optimal coding parameters in the sense of large deviations, and construct a simple yet practical coding method that approaches the Shannon limit.
📝 Abstract
We study a paradigm of coding for compression of the natural numbers via the zeta distribution and develop a statistical-mechanical interpretation, both in terms of Hagedorn systems and a Bose gas with energy levels given by logarithms of prime numbers. We also propose a simple coding scheme for the zeta distribution that nearly achieves the ideal code length. For block coding of vectors of natural numbers, we derive the micro-canonical entropy function and demonstrate its asymptotic linearity implying that its behavior is analogous to that of a Hagedorn system. We also derive the large deviations rate function, and provide a formula for the best coding parameter in the large deviations sense. We show that due the Hagedorn-type phase transition there is only partial equivalence of ensembles, due to the degeneration of the domain of the partition function.