Object-Centric Mobile Manipulation through SAM2-Guided Perception and Imitation Learning

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Existing mobile manipulation frameworks decouple navigation from manipulation, leading to task failures due to misaligned approach angles and limiting generalization. This work proposes an object-centric, orientation-robust manipulation paradigm—the first to integrate the SAM2 foundation model into mobile manipulation—unifying orientation-aware promptable segmentation with manipulation semantics for cross-view task understanding and execution. Methodologically, we design an end-to-end policy comprising: (i) SAM2-based orientation-aware visual segmentation; (ii) orientation-conditioned imitation learning; and (iii) action-sequence modeling, trained on a custom-built mobile manipulation platform. Experiments on multi-angle pick-and-place tasks demonstrate that our approach significantly outperforms Action Chunking Transformer in both generalization and robustness, particularly under viewpoint variation. This establishes a novel, scalable paradigm for general-purpose mobile manipulation robots.

Technology Category

Application Category

📝 Abstract

Imitation learning for mobile manipulation is a key challenge in the field of robotic manipulation. However, current mobile manipulation frameworks typically decouple navigation and manipulation, executing manipulation only after reaching a certain location. This can lead to performance degradation when navigation is imprecise, especially due to misalignment in approach angles. To enable a mobile manipulator to perform the same task from diverse orientations, an essential capability for building general-purpose robotic models, we propose an object-centric method based on SAM2, a foundation model towards solving promptable visual segmentation in images, which incorporates manipulation orientation information into our model. Our approach enables consistent understanding of the same task from different orientations. We deploy the model on a custom-built mobile manipulator and evaluate it on a pick-and-place task under varied orientation angles. Compared to Action Chunking Transformer, our model maintains superior generalization when trained with demonstrations from varied approach angles. This work significantly enhances the generalization and robustness of imitation learning-based mobile manipulation systems.

Problem

Research questions and friction points this paper is trying to address.

Enabling mobile manipulators to perform tasks from diverse orientations

Improving generalization in imitation learning for mobile manipulation

Integrating manipulation orientation information using SAM2-guided perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 for object-centric visual segmentation

Integrates manipulation orientation into model

Enhances generalization with varied approach angles

🔎 Similar Papers

Vision-based Manipulation from Single Human Video with Open-World Object Graphs