🤖 AI Summary
This work addresses the poor generalizability, redundant retraining requirements, and inefficiency of vision-based motor policies when deployed across heterogeneous robotic hardware. To this end, we propose DC-QFA, a unified framework featuring the first device-conditioned Once-for-All supernetwork architecture. Our approach integrates device-aware quantization-aware training, latency- and memory-aware regularization, lookup-table-based hardware-constrained neural architecture search, and multi-step on-policy distillation, enabling a single training run to produce lightweight policies adaptable to diverse platforms. Experiments demonstrate that DC-QFA achieves 2–3× speedup on edge devices, consumer-grade GPUs, and cloud platforms with negligible degradation in task success rates. Furthermore, real-robot evaluations confirm the long-term stability of low-bit policies in contact-intensive manipulation tasks.
📝 Abstract
The growing complexity of visuomotor policies poses significant challenges for deployment with heterogeneous robotic hardware constraints. However, most existing model-efficient approaches for robotic manipulation are device- and model-specific, lack generalizability, and require time-consuming per-device optimization during the adaptation process. In this work, we propose a unified framework named \textbf{D}evice-\textbf{C}onditioned \textbf{Q}uantization-\textbf{F}or-\textbf{A}ll (DC-QFA) which amortizes deployment effort with the device-conditioned quantization-aware training and hardware-constrained architecture search. Specifically, we introduce a single supernet that spans a rich design space over network architectures and mixed-precision bit-widths. It is optimized with latency- and memory-aware regularization, guided by per-device lookup tables. With this supernet, for each target platform, we can perform a once-for-all lightweight search to select an optimal subnet without any per-device re-optimization, which enables more generalizable deployment across heterogeneous hardware, and substantially reduces deployment time. To improve long-horizon stability under low precision, we further introduce multi-step on-policy distillation to mitigate error accumulation during closed-loop execution. Extensive experiments on three representative policy backbones, such as DiffusionPolicy-T, MDT-V, and OpenVLA-OFT, demonstrate that our DC-QFA achieves $2\text{-}3\times$ acceleration on edge devices, consumer-grade GPUs, and cloud platforms, with negligible performance drop in task success. Real-world evaluations on an Inovo robot equipped with a force/torque sensor further validates that our low-bit DC-QFA policies maintain stable, contact-rich manipulation even under severe quantization.