DockAnywhere: Data-Efficient Visuomotor Policy Learning for Mobile Manipulation via Novel Demonstration Generation

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the limited viewpoint generalization of visual policies in mobile manipulation caused by varying docking positions. To overcome this, the authors propose DockAnywhere, a framework that decouples base motion from viewpoint-invariant contact-based manipulation skills. From a single demonstration, the method automatically generates diverse feasible docking configurations and their corresponding trajectories. It leverages structure-preserving data augmentation and point-cloud-based 3D visual synthesis to expand training viewpoints while maintaining geometric consistency of the task. This approach achieves, for the first time, automatic generalization from a single demonstration to arbitrary feasible docking poses. Evaluated on both the ManiSkill simulation benchmark and real robotic platforms, the method significantly improves policy success rates, effectively generalizes to unseen docking viewpoints, and substantially enhances robustness for real-world deployment.

Technology Category

Application Category

📝 Abstract
Mobile manipulation is a fundamental capability that enables robots to interact in expansive environments such as homes and factories. Most existing approaches follow a two-stage paradigm, where the robot first navigates to a docking point and then performs fixed-base manipulation using powerful visuomotor policies. However, real-world mobile manipulation often suffers from the view generalization problem due to shifts of docking points. To address this issue, we propose a novel low-cost demonstration generation framework named DockAnywhere, which improves viewpoint generalization under docking variability by lifting a single demonstration to diverse feasible docking configurations. Specifically, DockAnywhere lifts a trajectory to any feasible docking points by decoupling docking-dependent base motions from contact-rich manipulation skills that remain invariant across viewpoints. Feasible docking proposals are sampled under feasibility constraints, and corresponding trajectories are generated via structure-preserving augmentation. Visual observations are synthesized in 3D space by representing the robot and objects as point clouds and applying point-level spatial editing to ensure the consistency of observation and action across viewpoints. Extensive experiments on ManiSkill and real-world platforms demonstrate that DockAnywhere substantially improves policy success rates and easily generalizes to novel viewpoints from unseen docking points during training, significantly enhancing the generalization capability of mobile manipulation policy in real-world deployment.
Problem

Research questions and friction points this paper is trying to address.

mobile manipulation
view generalization
docking variability
visuomotor policy
real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

data-efficient learning
viewpoint generalization
demonstration generation
mobile manipulation
visuomotor policy
🔎 Similar Papers
No similar papers found.
Ziyu Shan
Ziyu Shan
Nanyang Technological University
Embodied AIPoint Cloud Quality AssessmentLow-level Vision
Y
Yuheng Zhou
Nanyang Technological University, Singapore
G
Gaoyuan Wu
Nanyang Technological University, Singapore
Z
Ziheng Ji
Nanyang Technological University, Singapore
Z
Zhenyu Wu
Beijing University of Posts and Telecommunications, Beijing, China
Ziwei Wang
Ziwei Wang
School of Electrical and Electronic Engineering, Nanyang Technological University
embodied AIroboticscomputer vision