🤖 AI Summary
Addressing the fundamental trade-off between reconstruction fidelity and compression efficiency in visual tokenization and generation, this paper proposes a spherical non-parametric quantization method based on the 24-dimensional Leech lattice. It is the first work to introduce the highly symmetric Leech lattice into quantizer design, leveraging spherical uniform sampling and lattice coding theory to achieve efficient, auxiliary-loss-free, lookup-table-free tokenization. The method integrates seamlessly into both autoencoder-based and autoregressive (AR) generative frameworks. In image tokenization, it consistently outperforms BSQ—achieving significant PSNR/SSIM gains while reducing bit-rate by approximately 1.2%; in AR image generation, it lowers FID by 8.3%, markedly improving visual fidelity and structural consistency. The core innovation lies in exploiting the Leech lattice’s optimal spherical covering property to unify non-parametric quantization modeling, thereby overcoming the classical scalar/vector quantization trade-off bottleneck.
📝 Abstract
Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different non-parametric quantization methods through the lens of lattice coding. The geometry of lattice codes explains the necessity of auxiliary loss terms when training auto-encoders with certain existing lookup-free quantization variants such as BSQ. As a step forward, we explore a few possible candidates, including random lattices, generalized Fibonacci lattices, and densest sphere packing lattices. Among all, we find the Leech lattice-based quantization method, which is dubbed as Spherical Leech Quantization ($Lambda_{24}$-SQ), leads to both a simplified training recipe and an improved reconstruction-compression tradeoff thanks to its high symmetry and even distribution on the hypersphere. In image tokenization and compression tasks, this quantization approach achieves better reconstruction quality across all metrics than BSQ, the best prior art, while consuming slightly fewer bits. The improvement also extends to state-of-the-art auto-regressive image generation frameworks.