Explore with Long-term Memory: A Benchmark and Multimodal LLM-based Reinforcement Learning Framework for Embodied Exploration

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

🤖 AI Summary

This work addresses the common oversight in existing embodied intelligence approaches—the lack of synergistic integration between exploration processes and long-term memory—which hinders lifelong learning and complex, long-horizon decision-making. To bridge this gap, the authors propose the LMEE framework, introducing LMEE-Bench, the first evaluation benchmark that jointly incorporates multi-object navigation and memory-based question answering, emphasizing both procedural and outcome-oriented assessment. They further present MemoryExplorer, a method that unifies active exploration and memory retrieval by leveraging a multimodal large language model within a reinforcement learning paradigm, guided by a multi-task reward mechanism encompassing action prediction, frontier selection, and question answering. Experiments demonstrate that MemoryExplorer significantly outperforms current methods on long-horizon embodied tasks, markedly improving both exploration efficiency and memory utilization.

📝 Abstract

An ideal embodied agent should possess lifelong learning capabilities to handle long-horizon and complex tasks, enabling continuous operation in general environments. This not only requires the agent to accurately accomplish given tasks but also to leverage long-term episodic memory to optimize decision-making. However, existing mainstream one-shot embodied tasks primarily focus on task completion results, neglecting the crucial process of exploration and memory utilization. To address this, we propose Long-term Memory Embodied Exploration (LMEE), which aims to unify the agent's exploratory cognition and decision-making behaviors to promote lifelong learning. We further construct a corresponding dataset and benchmark, LMEE-Bench, incorporating multi-goal navigation and memory-based question answering to comprehensively evaluate both the process and outcome of embodied exploration. To enhance the agent's memory recall and proactive exploration capabilities, we propose MemoryExplorer, a novel method that fine-tunes a multimodal large language model through reinforcement learning to encourage active memory querying. By incorporating a multi-task reward function that includes action prediction, frontier selection, and question answering, our model achieves proactive exploration. Extensive experiments against state-of-the-art embodied exploration models demonstrate that our approach achieves significant advantages in long-horizon embodied tasks. Our dataset and code will be released at https://wangsen99.github.io/papers/lmee/

Problem

Research questions and friction points this paper is trying to address.

embodied exploration

long-term memory

lifelong learning

multimodal LLM

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

long-term memory

embodied exploration

multimodal LLM

reinforcement learning

active memory querying

🔎 Similar Papers

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

2024-10-04International Conference on Learning RepresentationsCitations: 0

Authors to Follow