MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Imitation learning for robotic manipulation suffers from insufficient diversity in spatial configurations: existing demonstration trajectories are predominantly collected in static settings—featuring fixed object poses, target locations, and camera viewpoints—leading to poor spatial generalization. To address this, we propose MOtion-Based Variability Enhancement (MOVE), a novel data augmentation strategy that introduces controlled, active motion of movable objects within a single demonstration trajectory. This induces dense, continuous variations in spatial configurations implicitly, thereby breaking the static data collection paradigm. MOVE is integrated into imitation learning frameworks both in simulation and on real robots, enabling joint training for dynamic data augmentation and spatial generalization. Experiments demonstrate that MOVE improves average task success rate by 76.1 percentage points (reaching 39.1%) in simulation, enhances data efficiency by 2–5× on selected tasks, and significantly boosts generalization to unseen spatial configurations.

Technology Category

Application Category

📝 Abstract
Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a emph{single, static spatial configuration} of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose extbf{MOtion-Based Variability Enhancement} (emph{MOVE}), a simple yet effective data collection paradigm that enables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, emph{MOVE} achieves an average success rate of 39.1%, a 76.1% relative improvement over the static data collection paradigm (22.2%), and yields up to 2--5$ imes$ gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE.
Problem

Research questions and friction points this paper is trying to address.

Addresses data scarcity in robotic imitation learning
Enhances spatial generalization through dynamic demonstrations
Improves data efficiency with motion-based augmentation strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces motion-based data collection paradigm
Augments demonstrations with object motion
Generates diverse spatial configurations per trajectory
H
Huanqian Wang
BNRist, Tsinghua University
C
Chi Bene Chen
BNRist, Tsinghua University
Y
Yang Yue
BNRist, Tsinghua University
D
Danhua Tao
Southeast University
Tong Guo
Tong Guo
Associate Professor of Marketing, Duke University's Fuqua School of Business
Quantitative MarketingCausal InferenceMachine LearningHealthcareInformation
S
Shaoxuan Xie
Beijing Academy of Artificial Intelligence
D
Denghang Huang
Beijing Academy of Artificial Intelligence
Shiji Song
Shiji Song
Tsinghua University
Modeling and optimizationcomplex systemand stochastic systems
G
Guocai Yao
Beijing Academy of Artificial Intelligence
G
Gao Huang
BNRist, Tsinghua University