🤖 AI Summary
Current vision-language-action (VLA) models lack force perception and compliance control in contact-intensive tasks, often leading to unsafe interactions or task failure. This work proposes a novel approach that integrates vision-language models (VLMs) with variable impedance control (VIC): the VLM interprets task context to dynamically adjust the stiffness and damping parameters of VIC, while real-time force/torque feedback ensures interaction forces remain within safe thresholds. To the best of our knowledge, this is the first method to enable force-aware adaptation and safe physical interaction in VLA systems during contact-rich scenarios. Experimental results in both simulation and on a real robot demonstrate significant improvements over existing baselines, increasing task success rates from 9.86% to 17.29% and substantially reducing force-limit violations.
📝 Abstract
We propose a CompliantVLA-adaptor that augments the state-of-the-art Vision-Language-Action (VLA) models with vision-language model (VLM)-informed context-aware variable impedance control (VIC) to improve the safety and effectiveness of contact-rich robotic manipulation tasks. Existing VLA systems (e.g., RDT, Pi0, OpenVLA-oft) typically output position, but lack force-aware adaptation, leading to unsafe or failed interactions in physical tasks involving contact, compliance, or uncertainty. In the proposed CompliantVLA-adaptor, a VLM interprets task context from images and natural language to adapt the stiffness and damping parameters of a VIC controller. These parameters are further regulated using real-time force/torque feedback to ensure interaction forces remain within safe thresholds. We demonstrate that our method outperforms the VLA baselines on a suite of complex contact-rich tasks, both in simulation and on real hardware, with improved success rates and reduced force violations. The overall success rate across all tasks increases from 9.86\% to 17.29\%, presenting a promising path towards safe contact-rich manipulation using VLAs. We release our code, prompts, and force-torque-impedance-scenario context datasets at https://sites.google.com/view/compliantvla.