🤖 AI Summary
Deploying deep neural network (DNN) policies for resource-constrained robotic control—such as manipulator operation and autonomous driving—faces dual challenges: excessive computational overhead and decision degradation induced by low-bit quantization. This paper proposes a saliency-aware loss-weighted quantization-aware training (QAT) paradigm, the first to incorporate dynamic task-critical state saliency modeling into QAT, synergistically integrated with imitation learning for robust low-bit compression. The method supports 4-bit weight quantization and extends to multimodal vision-language-action (VLA) models. Evaluated on edge GPUs, it achieves 2.5× inference speedup and 2.5× energy reduction. The 4-bit robotic arm control model incurs <1% accuracy drop versus full-precision baselines, while maintaining full-precision performance across simulation, real-world robotic deployment, and autonomous driving tasks.
📝 Abstract
Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.