🤖 AI Summary
This work addresses the instability and error amplification in early-exit Vision Transformers caused by low-precision quantization, which perturbs exit decisions and degrades deployment reliability. To this end, it introduces amortized precision quantization (APQ) and proposes a Mutual Adaptation framework for Quantized Early Exiting (MAQEE). MAQEE jointly optimizes exit thresholds and bit-widths through a bilevel optimization scheme, incorporating an explicit risk-control mechanism to enable synergistic co-adaptation between quantization and exit policies. Evaluated across classification, detection, and segmentation tasks, the method reduces computational cost by up to 95% in BOPs while matching or even surpassing strong full-precision baselines—achieving accuracy gains of up to 20%—thereby revealing an effective trade-off between model depth and numerical precision.
📝 Abstract
Vision Transformers (ViTs) achieve strong performance across vision tasks, yet their deployment with low-precision early exiting remains fragile. Existing quantization methods assume static full-depth execution, making them unstable when exit decisions are perturbed by quantization noise, which can amplify errors along dynamic inference paths. In this paper, we introduce Amortized-Precision Quantization (APQ), a utilization-aware formulation that accounts for layer-wise stochastic exposure to quantization noise and reveals depth-precision trade-offs. Building on APQ, we propose Mutual Adaptive Quantization with Early Exiting (MAQEE), a bi-level framework that jointly optimizes exit thresholds and bit-widths under explicit risk control to improve inference stability. MAQEE establishes a superior Pareto frontier in the accuracy-efficiency trade-off, reducing BOPs by up to 95% while maintaining accuracy and outperforming strong baselines by up to 20\% across classification, detection, and segmentation tasks.