Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mobile manipulation frameworks decouple navigation from manipulation, leading to task failures due to misaligned approach angles and limiting generalization. This work proposes an object-centric, orientation-robust manipulation paradigm—the first to integrate the SAM2 foundation model into mobile manipulation—unifying orientation-aware promptable segmentation with manipulation semantics for cross-view task understanding and execution. Methodologically, we design an end-to-end policy comprising: (i) SAM2-based orientation-aware visual segmentation; (ii) orientation-conditioned imitation learning; and (iii) action-sequence modeling, trained on a custom-built mobile manipulation platform. Experiments on multi-angle pick-and-place tasks demonstrate that our approach significantly outperforms Action Chunking Transformer in both generalization and robustness, particularly under viewpoint variation. This establishes a novel, scalable paradigm for general-purpose mobile manipulation robots.

Technology Category

Application Category

📝 Abstract
Imitation learning for mobile manipulation is a key challenge in the field of robotic manipulation. However, current mobile manipulation frameworks typically decouple navigation and manipulation, executing manipulation only after reaching a certain location. This can lead to performance degradation when navigation is imprecise, especially due to misalignment in approach angles. To enable a mobile manipulator to perform the same task from diverse orientations, an essential capability for building general-purpose robotic models, we propose an object-centric method based on SAM2, a foundation model towards solving promptable visual segmentation in images, which incorporates manipulation orientation information into our model. Our approach enables consistent understanding of the same task from different orientations. We deploy the model on a custom-built mobile manipulator and evaluate it on a pick-and-place task under varied orientation angles. Compared to Action Chunking Transformer, our model maintains superior generalization when trained with demonstrations from varied approach angles. This work significantly enhances the generalization and robustness of imitation learning-based mobile manipulation systems.
Problem

Research questions and friction points this paper is trying to address.

Enabling mobile manipulators to perform tasks from diverse orientations
Improving generalization in imitation learning for mobile manipulation
Integrating manipulation orientation information using SAM2-guided perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 for object-centric visual segmentation
Integrates manipulation orientation into model
Enhances generalization with varied approach angles
🔎 Similar Papers
No similar papers found.
W
Wang Zhicheng
Learning Machines Group, Graduate School of Informatics, Kyoto University, Kyoto, Japan
S
Satoshi Yagi
Learning Machines Group, Graduate School of Informatics, Kyoto University, Kyoto, Japan
S
Satoshi Yamamori
Dept. of Brain Robot Interface, Computational Neuroscience Labs, ATR, Kyoto, Japan
Jun Morimoto
Jun Morimoto
Kyoto University & ATR Computational Neuroscience Labs
RoboticsMachine LearningComputational Neuroscience