🤖 AI Summary
Adaptive vector quantization (AVQ) is essential for compressing gradients, weights, activations, and datasets in machine learning, yet existing methods suffer from prohibitive time and memory complexity.
Method: We propose the first algorithms for AVQ—both strictly optimal and highly efficient near-optimal—overcoming these bottlenecks. Our optimal algorithm employs a progressive dynamic programming framework with greedy pruning and error-bounded divide-and-conquer. For large-scale inputs, we introduce a super-fast near-optimal variant leveraging structural approximations and rigorous error analysis.
Contribution/Results: The optimal algorithm guarantees theoretical precision, while the near-optimal variant achieves controllable distortion with 10–100× speedup and significantly reduced memory footprint. Both support seamless end-to-end integration into modern ML systems, enabling practical AVQ deployment across training and inference pipelines.
📝 Abstract
Quantization is a fundamental optimization for many machine-learning use cases, including compressing gradients, model weights and activations, and datasets. The most accurate form of quantization is emph{adaptive}, where the error is minimized with respect to a given input, rather than optimizing for the worst case. However, optimal adaptive quantization methods are considered infeasible in terms of both their runtime and memory requirements. We revisit the Adaptive Vector Quantization (AVQ) problem and present algorithms that find optimal solutions with asymptotically improved time and space complexity. We also present an even faster near-optimal algorithm for large inputs. Our experiments show our algorithms may open the door to using AVQ more extensively in a variety of machine learning applications.