Embedding Compression via Spherical Coordinates

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

This work addresses the challenge of efficiently compressing high-dimensional unit-norm embeddings in a lossless manner. The authors propose a novel approach based on spherical coordinate transformation, revealing that angular components of high-dimensional unit vectors are tightly concentrated around π/2. This concentration induces strong statistical regularities: the exponent fields in their IEEE 754 floating-point representations become nearly identical, and higher-order mantissa bits exhibit high predictability. Leveraging these insights, the paper introduces a tailored entropy coding scheme that exploits this previously unrecognized structure. Evaluated across 26 diverse embedding configurations—including text, image, and multi-vector settings—the method achieves an average compression ratio of 1.5× with reconstruction errors below 1e⁻⁷ (well beneath float32 machine epsilon), outperforming the current state-of-the-art lossless techniques by 25%.

Technology Category

Application Category

📝 Abstract

We present a compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25% better than the best prior lossless method. The method exploits that spherical coordinates of high-dimensional unit vectors concentrate around $\pi/2$, causing IEEE 754 exponents to collapse to a single value and high-order mantissa bits to become predictable, enabling entropy coding of both. Reconstruction error is below 1e-7, under float32 machine epsilon. Evaluation across 26 configurations spanning text, image, and multi-vector embeddings confirms consistent improvement.

Problem

Research questions and friction points this paper is trying to address.

embedding compression

unit-norm embeddings

spherical coordinates

entropy coding

reconstruction error

Innovation

Methods, ideas, or system contributions that make the work stand out.

embedding compression

spherical coordinates

entropy coding