DyQ-VLA: Temporal-Dynamic-Aware Quantization for Embodied Vision-Language-Action Models

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing static quantization methods struggle to accommodate the dynamic precision requirements of embodied vision-language-action (VLA) models across different temporal stages, leading to performance degradation and inefficient resource utilization. This work proposes DyQ-VLA, the first quantization framework that incorporates temporal dynamics awareness. By leveraging a kinematic proxy to trigger real-time bit-width switching and integrating a kinematics-guided module for dynamically allocating optimal bit widths, DyQ-VLA enables sensitivity-aware, stage-adaptive quantization. The approach achieves remarkable efficiency: it retains 99.5% of the original model performance while occupying only 30.9% of the memory footprint, delivers a 1.49× speedup in simulation, and accelerates real-world robotic tasks by up to 1.43×.

Technology Category

Application Category

📝 Abstract
Vision-Language-Action (VLA) models are dominant in embodied intelligence but are constrained by inference overheads. While model quantization alleviates these bottlenecks for edge deployment, static quantization approaches remain suboptimal for VLAs due to two critical challenges: (1) Temporal-dynamic sensitivity, where fixed precision wastes resources by ignoring stage-varying error tolerances; and (2) Real-time allocation, where identifying real-time sensitivity to guide bit allocation remains unsolved. To address these challenges, we propose DyQ-VLA, a dynamic quantization framework for VLAs. Specifically, a sensitivity-aware switching strategy leverages real-time kinematic proxies to trigger the bit-width switch, while a kinematic-guided module dynamically allocates the optimal bit-width. Experiments show that DyQ-VLA requires only 30.9% of the original memory footprint while maintaining 99.5% of its original performance, achieving 1.49x simulation and up to 1.43x real-world speedups.
Problem

Research questions and friction points this paper is trying to address.

Temporal-dynamic sensitivity
Real-time allocation
Model quantization
Vision-Language-Action models
Bit-width allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic quantization
temporal-dynamic awareness
vision-language-action models
bit-width allocation
embodied intelligence
🔎 Similar Papers
No similar papers found.
Zihao Zheng
Zihao Zheng
Peking University
Machine Learning SystemEdge ComputingComputer ArchitectureEDA
H
Hangyu Cao
School of Software Engineering, South China University of Technology, Guangzhou, China
S
Sicheng Tian
School of Artificial Intelligence, Beijing Normal University, Beijing, China
Jiayu Chen
Jiayu Chen
PhD student, IFLab@PKU
Efficient Visual GenerationML system
M
Maoliang Li
School of Computer Science, Peking University, Beijing, China
X
Xinhao Sun
School of Electronics Engineering and Computer Science, Peking University, Beijing, China
H
Hailong Zou
School of Computer Science, Peking University, Beijing, China
Z
Zhaobo Zhang
School of Computer Science, Peking University, Beijing, China
Xuanzhe Liu
Xuanzhe Liu
Boya Distinguished Professor, Peking University, ACM Distinguished Scientist
Machine Learning SystemMobile Computing SystemServerless Computing
D
Donggang Cao
School of Computer Science, Peking University, Beijing, China
Hong Mei
Hong Mei
Peking University
Software EngineeringSystem SoftwareData Analytics
X
Xiang Chen
School of Computer Science, Peking University, Beijing, China