🤖 AI Summary
To address the progressive attenuation of residual magnitudes in residual quantization caused by finite scalar quantization (FSQ), this paper proposes Robust Finite Scalar Quantization (R-FSQ). The core innovation lies in integrating learnable scaling factors and invertible LayerNorm into the FSQ framework—preserving FSQ’s architectural simplicity and training stability while effectively mitigating signal attenuation and enhancing multi-stage quantization capability. R-FSQ is fully compatible with standard residual quantization architectures and requires no additional hyperparameter tuning. Evaluated on ImageNet, R-FSQ achieves a 45% reduction in perceptual loss and a 28.7% decrease in L1 reconstruction error compared to both VQ-EMA and vanilla FSQ, demonstrating substantial improvements in reconstruction fidelity and compression efficiency.
📝 Abstract
Finite Scalar Quantization (FSQ) has emerged as a promising alternative to Vector Quantization (VQ) in neural compression, offering simplified training and improved stability. However, naive application of FSQ in residual quantization frameworks suffers from the extbf{residual magnitude decay problem}, where subsequent FSQ layers receive progressively weaker signals, severely limiting their effectiveness. We propose extbf{Robust Residual Finite Scalar Quantization (RFSQ)}, a general framework that addresses this fundamental limitation through two novel conditioning strategies: learnable scaling factors and invertible layer normalization. Our approach maintains the simplicity of FSQ while enabling effective multi-stage residual quantization. Comprehensive experiments on ImageNet demonstrate that RFSQ variants significantly outperform strong baselines including VQ-EMA, FSQ, and LFQ, achieving up to 45% improvement in perceptual loss and 28.7% reduction in L1 reconstruction error. The proposed LayerNorm strategy shows the most consistent improvements across different configurations, establishing RFSQ as a superior quantization method for neural compression.