HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models

📅 2026-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of error accumulation in 1-bit quantized vision-language-action (VLA) models, which arises from distributional mismatches in weights and severely degrades long-horizon closed-loop control performance. To mitigate this, the authors propose HBVLA, a binary VLA framework that, for the first time, integrates policy-aware Hessian analysis to identify critical weights, applies sparse orthogonal transformations to non-critical weights, and performs grouped 1-bit quantization in the Haar domain to minimize distributional divergence between full-precision and quantized models. Evaluated on LIBERO and SimplerEnv, HBVLA retains 92.2% and 93.6% of the original performance, respectively, substantially outperforming existing binarization methods. Real-world robot experiments further demonstrate that HBVLA achieves success rates on par with full-precision models, confirming its reliability for deployment on resource-constrained platforms.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing methods fail to narrow the distribution gap between binarized and full-precision weights, causing quantization errors to accumulate under long-horizon closed-loop execution and severely degrade actions. To fill this gap, we propose HBVLA, a VLA-tailored binarization framework. First, we use a policy-aware enhanced Hessian to identify weights that are truly critical for action generation. Then, we employ a sparse orthogonal transform for non-salient weights to induce a low-entropy intermediate state. Finally, we quantize both salient and non-salient weights in the Harr domain with group-wise 1-bit quantization. We have evaluated our approach on different VLAs: on LIBERO, quantized OpenVLA-OFT retains 92.2% of full-precision performance; on SimplerEnv, quantized CogAct retains 93.6%, significantly outperforming state-of-the-art binarization methods. We further validate our method on real-world evaluation suite and the results show that HBVLA incurs only marginal success-rate degradation compared to the full-precision model, demonstrating robust deployability under tight hardware constraints. Our work provides a practical foundation for ultra-low-bit quantization of VLAs, enabling more reliable deployment on hardware-limited robotic platforms.
Problem

Research questions and friction points this paper is trying to address.

Vision-Language-Action models
1-bit quantization
quantization error
distribution gap
embodied control
Innovation

Methods, ideas, or system contributions that make the work stand out.

1-bit quantization
Vision-Language-Action models
Hessian-aware binarization
sparse orthogonal transform
Harr domain quantization
🔎 Similar Papers
Xin Yan
Xin Yan
Missouri University of S&T, Google
Z
Zhenglin Wan
Department of Computer Science, National University of Singapore, Singapore
Feiyang Ye
Feiyang Ye
University of Technology Sydney, Ph.D student
Multi-Task Learning
Xingrui Yu
Xingrui Yu
Scientist, CFAR, A*STAR
Machine LearningRobust Imitation LearningTrustworthy AI
H
Hangyu Du
College of Design and Engineering, National University of Singapore, Singapore
Yang You
Yang You
Postdoc, Stanford University
3D visioncomputer graphicscomputational geometry
I
Ivor Tsang
Centre for Frontier AI Research, Agency for Science, Technology and Research (A*STAR), Singapore