Referring-Aware Visuomotor Policy Learning for Closed-Loop Manipulation

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited robustness of visuomotor policies when facing out-of-distribution execution errors or dynamic environments. To this end, the authors propose ReV, a closed-loop framework trained solely on raw expert demonstrations. ReV is the first to integrate sparse reference points—derived from human input or high-level planners—into a diffusion-based policy. By employing a coupled global-local diffusion head architecture together with a trajectory-guidance mechanism, the method enables real-time trajectory replanning without requiring additional data or fine-tuning. Training on perturbed expert demonstrations facilitates efficient closed-loop control, leading to significantly higher task success rates across multiple complex simulated and real-world scenarios, thereby demonstrating strong robustness and generalization capabilities.
📝 Abstract
This paper addresses a fundamental problem of visuomotor policy learning for robotic manipulation: how to enhance robustness in out-of-distribution execution errors or dynamically re-routing trajectories, where the model relies solely on the original expert demonstrations for training. We introduce the Referring-Aware Visuomotor Policy (ReV), a closed-loop framework that can adapt to unforeseen circumstances by instantly incorporating sparse referring points provided by a human or a high-level reasoning planner. Specifically, ReV leverages the coupled diffusion heads to preserve standard task execution patterns while seamlessly integrating sparse referring via a trajectory-steering strategy. Upon receiving a specific referring point, the global diffusion head firstly generates a sequence of globally consistent yet temporally sparse action anchors, while identifies the precise temporal position for the referring point within this sequence. Subsequently, the local diffusion head adaptively interpolates adjacent anchors based on the current temporal position for specific tasks. This closed-loop process repeats at every execution step, enabling real-time trajectory replanning in response to dynamic changes in the scene. In practice, rather than relying on elaborate annotations, ReV is trained only by applying targeted perturbations to expert demonstrations. Without any additional data or fine-tuning scheme, ReV achieve higher success rates across challenging simulated and real-world tasks.
Problem

Research questions and friction points this paper is trying to address.

visuomotor policy learning
out-of-distribution robustness
trajectory replanning
closed-loop manipulation
expert demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

referring-aware policy
closed-loop manipulation
diffusion-based trajectory planning
visuomotor policy learning
real-time replanning
J
Jiahua Ma
Sun Yat-sen University
Y
Yiran Qin
Oxford University
X
Xin Wen
Sun Yat-sen University
Y
Yixiong Li
Sun Yat-sen University
Y
Yuyu Sun
Sun Yat-sen University
Yulan Guo
Yulan Guo
Professor, Sun Yat-sen University
3D VisionMachine LearningRobotics
Liang Lin
Liang Lin
Fellow of IEEE/IAPR, Professor of Computer Science, Sun Yat-sen University
Embodied AICausal Inference and LearningMultimodal Data Analysis
R
Ruimao Zhang
Sun Yat-sen University