ForceVLA2: Unleashing Hybrid Force-Position Control with Force Awareness for Contact-Rich Manipulation

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing contact-intensive manipulation approaches, which predominantly rely on position control and lack explicit perception and regulation of interaction forces, resulting in insufficient stability, precision, and robustness. To overcome these challenges, we propose the first end-to-end vision-language-action framework that integrates explicit force-aware tasks. Our method dynamically fuses multimodal inputs—including visual observations, language instructions, proprioceptive states, and force signals—through force-aware prompting and a cross-scale Mixture-of-Experts (MoE) mechanism, enabling closed-loop hybrid force-position control. Evaluated across five contact-rich manipulation tasks, the proposed approach achieves success rates 48.0% and 35.0% higher than those of Pi0 and Pi0.5, respectively, while significantly mitigating common failure modes such as robotic arm overload and unstable contact interactions.

Technology Category

Application Category

📝 Abstract
Embodied intelligence for contact-rich manipulation has predominantly relied on position control, while explicit awareness and regulation of interaction forces remain under-explored, limiting stability, precision, and robustness in real-world tasks. We propose ForceVLA2, an end-to-end vision-language-action framework that equips robots with hybrid force-position control and explicit force awareness. ForceVLA2 introduces force-based prompts into the VLM expert to construct force-aware task concepts across stages, and employs a Cross-Scale Mixture-of-Experts (MoE) in the action expert to adaptively fuse these concepts with real-time interaction forces for closed-loop hybrid force-position regulation. To support learning and evaluation, we construct ForceVLA2-Dataset, containing 1,000 trajectories over 5 contact-rich tasks, including wiping, pressing, and assembling, with multi-view images, task prompts, proprioceptive state, and force signals. Extensive experiments show that ForceVLA2 substantially improves success rates and reliability in contact-rich manipulation, outperforming pi0 and pi0.5 by 48.0% and 35.0%, respectively, across the 5 tasks, and mitigating common failure modes such as arm overload and unstable contact, thereby actively advancing force-aware interactive physical intelligence in VLAs. The project page is available at https://sites.google.com/view/force-vla2/home.
Problem

Research questions and friction points this paper is trying to address.

contact-rich manipulation
force awareness
hybrid force-position control
embodied intelligence
interaction forces
Innovation

Methods, ideas, or system contributions that make the work stand out.

force-aware manipulation
hybrid force-position control
vision-language-action (VLA)
Mixture-of-Experts (MoE)
contact-rich tasks
Y
Yang Li
Tongji University; Shanghai AI Laboratory; Lumos Robotics
Z
Zhaxizhuoma
Shanghai Jiao Tong University; Shanghai AI Laboratory; Lumos Robotics
H
Hongru Jiang
Shanghai Jiao Tong University
J
Junjie Xia
Shanghai AI Laboratory; Lumos Robotics
H
Hongquan Zhang
East China Normal University; Shanghai Innovation Institute; Shanghai AI Laboratory
J
Jinda Du
Shanghai Jiao Tong University
Yunsong Zhou
Yunsong Zhou
Shanghai Jiao Tong University
Embodied AIGenerative Models
Jia Zeng
Jia Zeng
Shanghai AI Laboratory
Embodied AIRobotic ManipulationVision-Language-Action
Ce Hao
Ce Hao
National University of Singapore
J
Jieji Ren
Shanghai Jiao Tong University
Qiaojun Yu
Qiaojun Yu
Shanghai Jiao Tong University, Shanghai AI Lab
robotic learning3D visionvla
C
Cewu Lu
Shanghai Jiao Tong University; Shanghai Innovation Institute; Noematrix Intelligence
Yu Qiao
Yu Qiao
Professor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CAS
Computer VisionPattern RecognitionLarge Multimodal ModelLarge Language Model
J
Jiangmiao Pang
Shanghai AI Laboratory