Saliency-Aware Quantized Imitation Learning for Efficient Robotic Control

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Deploying deep neural network (DNN) policies for resource-constrained robotic control—such as manipulator operation and autonomous driving—faces dual challenges: excessive computational overhead and decision degradation induced by low-bit quantization. This paper proposes a saliency-aware loss-weighted quantization-aware training (QAT) paradigm, the first to incorporate dynamic task-critical state saliency modeling into QAT, synergistically integrated with imitation learning for robust low-bit compression. The method supports 4-bit weight quantization and extends to multimodal vision-language-action (VLA) models. Evaluated on edge GPUs, it achieves 2.5× inference speedup and 2.5× energy reduction. The 4-bit robotic arm control model incurs <1% accuracy drop versus full-precision baselines, while maintaining full-precision performance across simulation, real-world robotic deployment, and autonomous driving tasks.

Technology Category

Application Category

📝 Abstract

Deep neural network (DNN)-based policy models, such as vision-language-action (VLA) models, excel at automating complex decision-making from multi-modal inputs. However, scaling these models greatly increases computational overhead, complicating deployment in resource-constrained settings like robot manipulation and autonomous driving. To address this, we propose Saliency-Aware Quantized Imitation Learning (SQIL), which combines quantization-aware training with a selective loss-weighting strategy for mission-critical states. By identifying these states via saliency scores and emphasizing them in the training loss, SQIL preserves decision fidelity under low-bit precision. We validate SQIL's generalization capability across extensive simulation benchmarks with environment variations, real-world tasks, and cross-domain tasks (self-driving, physics simulation), consistently recovering full-precision performance. Notably, a 4-bit weight-quantized VLA model for robotic manipulation achieves up to 2.5x speedup and 2.5x energy savings on an edge GPU with minimal accuracy loss. These results underline SQIL's potential for efficiently deploying large IL-based policy models on resource-limited devices.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in DNN-based robotic control

Maintaining decision fidelity under low-bit precision

Enabling efficient deployment on resource-limited devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Saliency-aware quantization for efficient learning

Selective loss-weighting for critical states

4-bit quantized VLA model for robotics

🔎 Similar Papers

Quality Diversity Imitation Learning