DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

📅 2024-11-07

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the semantic perception failure caused by the static-environment assumption in open-vocabulary mobile manipulation, this paper proposes the first online dynamic spatiotemporal semantic memory framework tailored for real-world dynamic scenes. Our method centers on an incremental 3D point cloud memory that supports real-time insertion, deletion, update, and query of objects under motion, occlusion, and entry/exit events. It integrates multimodal large language models (MLLMs) with open-vocabulary vision-language features (e.g., CLIP/SigLIP) to enable natural-language-driven cross-modal object localization. The framework is deployed in real time on the Stretch SE3 robotic platform. Evaluated across three real-world and nine offline dynamic scenarios, our approach achieves a 70% success rate in grasping non-stationary objects—more than doubling the performance of the best static-baseline method. All code, demonstration videos, and deployment documentation are publicly released.

Technology Category

Application Category

📝 Abstract

Significant progress has been made in open-vocabulary mobile manipulation, where the goal is for a robot to perform tasks in any environment given a natural language description. However, most current systems assume a static environment, which limits the system's applicability in real-world scenarios where environments frequently change due to human intervention or the robot's own actions. In this work, we present DynaMem, a new approach to open-world mobile manipulation that uses a dynamic spatio-semantic memory to represent a robot's environment. DynaMem constructs a 3D data structure to maintain a dynamic memory of point clouds, and answers open-vocabulary object localization queries using multimodal LLMs or open-vocabulary features generated by state-of-the-art vision-language models. Powered by DynaMem, our robots can explore novel environments, search for objects not found in memory, and continuously update the memory as objects move, appear, or disappear in the scene. We run extensive experiments on the Stretch SE3 robots in three real and nine offline scenes, and achieve an average pick-and-drop success rate of 70% on non-stationary objects, which is more than a 2x improvement over state-of-the-art static systems. Our code as well as our experiment and deployment videos are open sourced and can be found on our project website: https://dynamem.github.io/

Problem

Research questions and friction points this paper is trying to address.

Dynamic environment handling in mobile manipulation tasks

Open-vocabulary object localization in changing scenes

Continuous memory updates for object movement and appearance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic spatio-semantic memory for environment representation

3D data structure for dynamic point cloud memory

Multimodal LLMs for open-vocabulary object localization

🔎 Similar Papers

Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots