🤖 AI Summary
This work investigates the feasibility of large language models (LLMs) for lossless archival compression, addressing data irreproducibility caused by GPU nondeterminism—termed herein the “GPU butterfly effect”—and high computational costs. To ensure reproducible decompression and enable precise measurement of neural compression rates, the authors propose a Hybrid-LLM architecture combined with a novel logit quantization protocol. The study further introduces a distinction between memory-based and prediction-based semantic compression densities, quantifying the LLM’s “entropy capacity” and laying foundational groundwork for semantic file systems. Experimental results demonstrate compression performance of 0.39 and 0.75 bits per character (BPC) on memorizable text and news datasets, respectively. Although inference is approximately 2,600 times slower than Zstd, the approach validates that LLMs can effectively capture semantic redundancies inaccessible to conventional compression algorithms.
📝 Abstract
Large Language Models (LLMs) possess a theoretical capability to model information density far beyond the limits of classical statistical methods (e.g., Lempel-Ziv). However, utilizing this capability for lossless compression involves navigating severe system constraints, including non-deterministic hardware and prohibitive computational costs. In this work, we present an exploratory study into the feasibility of LLM-based archival systems. We introduce \textbf{Hybrid-LLM}, a proof-of-concept architecture designed to investigate the "entropic capacity" of foundation models in a storage context.
\textbf{We identify a critical barrier to deployment:} the "GPU Butterfly Effect," where microscopic hardware non-determinism precludes data recovery. We resolve this via a novel logit quantization protocol, enabling the rigorous measurement of neural compression rates on real-world data. Our experiments reveal a distinct divergence between "retrieval-based" density (0.39 BPC on memorized literature) and "predictive" density (0.75 BPC on unseen news). While current inference latency ($\approx 2600\times$ slower than Zstd) limits immediate deployment to ultra-cold storage, our findings demonstrate that LLMs successfully capture semantic redundancy inaccessible to classical algorithms, establishing a baseline for future research into semantic file systems.