CoFreeVLA: Collision-Free Dual-Arm Manipulation via Vision-Language-Action Model and Risk Estimation

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the safety risks in dual-arm manipulation under vision-language instructions, where unmodeled self-collisions can lead to collisions between robot arms or with objects. The authors propose a risk-aware, end-to-end Vision-Language-Action (VLA) framework that integrates, for the first time, a short-term self-collision risk estimator with a VLA model. By fusing proprioceptive and visual embeddings, the system dynamically predicts high-risk commands and halts their execution in real time. Additionally, it incorporates a risk-guided state recovery and policy optimization mechanism. The method is pretrained in simulation using collision labels and fine-tuned on a real PiPER dual-arm robot. Experiments across five dual-arm tasks demonstrate a significant reduction in self-collision rates and higher task success compared to existing approaches such as RDT and APEX.

Technology Category

Application Category

📝 Abstract

Vision Language Action (VLA) models enable instruction following manipulation, yet dualarm deployment remains unsafe due to under modeled selfcollisions between arms and grasped objects. We introduce CoFreeVLA, which augments an endtoend VLA with a short horizon selfcollision risk estimator that predicts collision likelihood from proprioception, visual embeddings, and planned actions. The estimator gates risky commands, recovers to safe states via risk-guided adjustments, and shapes policy refinement for safer rollouts. It is pre-trained with model-based collision labels and posttrained on real robot rollouts for calibration. On five bimanual tasks with the PiPER robot arm, CoFreeVLA reduces selfcollisions and improves success rates versus RDT and APEX.

Problem

Research questions and friction points this paper is trying to address.

dual-arm manipulation

self-collision

Vision-Language-Action model

collision avoidance

bimanual tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language-Action Model

Self-collision Avoidance

Risk Estimation