ForceFlow: Learning to Feel and Act via Contact-Driven Flow Matching

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the challenge of high-precision force control in contact-rich robotic manipulation, where complex contact dynamics hinder performance. To this end, we propose ForceFlow, a framework that constructs force-aware reactive policies through flow matching and employs a hierarchical architecture comprising a vision-dominated approach phase and a tactile-dominated interaction phase. A key innovation is the Vision-to-Force (V2F) mechanism, which decouples spatial generalization from contact regulation by treating force signals as global modulation factors. We further introduce an asymmetric multimodal fusion strategy and a joint prediction paradigm that, for the first time in imitation learning, explicitly separates visual localization from force control execution. Evaluated on six real-world tasks, ForceFlow achieves a 37% higher success rate than the ForceVLA baseline while demonstrating accurate force prediction, robust contact self-regulation, and strong zero-shot out-of-distribution generalization.

📝 Abstract

Existing imitation learning methods enable robots to interact autonomously with the physical environment. However, contact-rich manipulation tasks remain a significant challenge due to complex contact dynamics that demand high-precision force feedback and control. Although recent efforts have attempted to integrate force/torque sensing into policies, how to build a simple yet effective framework that achieves robust generalization under multimodal observations remains an open question. In this paper, we propose ForceFlow, a force-aware reactive framework built upon flow matching. For contact-stage policy design, we investigate force signal fusion mechanisms and adopt an asymmetric multimodal fusion architecture that treats force as a global regulatory signal, combined with a joint prediction paradigm that enhances the policy's understanding of instantaneous force and historical information, thereby achieving deep coupling between force and motion. For task-level hierarchical decomposition, we divide manipulation into a vision-dominant approach stage (VLM-based pointing for target localization) and a touch-dominant interaction stage (force-driven contact execution), with a Vision-to-Force (V2F) handover mechanism that explicitly decouples spatial generalization from contact regulation. Experimental results across six real-world contact-rich tasks demonstrate that ForceFlow achieves a 37% success rate improvement over the strong baseline ForceVLA while maintaining significantly lower cost. Moreover, ForceFlow exhibits accurate force signal prediction and demonstrates superior performance in contact force self-regulation and zero-shot out-of-distribution (OOD) generalization.

Problem

Research questions and friction points this paper is trying to address.

imitation learning

contact-rich manipulation

force feedback

multimodal fusion

generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching

force-aware policy

asymmetric multimodal fusion

Vision-to-Force handover

contact-rich manipulation

🔎 Similar Papers

No similar papers found.