ReEXplore: Improving MLLMs for Embodied Exploration with Contextualized Retrospective Experience Replay

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

To address three key challenges in embodied exploration with multimodal large language models (MLLMs)—outdated pretraining knowledge, high training costs under long-horizon sparse rewards, and unreliable decision-making due to excessively large frontier action spaces—this paper proposes a training-free inference-time optimization framework. First, context-driven retrospective experience replay dynamically injects abstracted historical experiences into the reasoning process. Second, coarse-to-fine hierarchical frontier selection substantially compresses the vision-action space while enhancing decision traceability and robustness. Integrating MLLMs, context-aware experience distillation, and hierarchical decision-making, our method outperforms strong baselines across multiple embodied exploration benchmarks, achieving up to a 3× improvement in task success rate and navigation efficiency. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Embodied exploration is a target-driven process that requires embodied agents to possess fine-grained perception and knowledge-enhanced decision making. While recent attempts leverage MLLMs for exploration due to their strong perceptual and reasoning abilities, we find that MLLM-based embodied agents remain suboptimal in exploring new environments: (i) they rely on profound but stale pre-trained knowledge, (ii) training-based approaches such as imitation learning or reinforcement learning are expensive for long-horizon tasks with sparse outcome rewards, and (iii) frontier-based exploration yields a large, visually nuanced action space that is difficult for MLLMs to make reliable decisions. We address these challenges with ReEXplore, a training-free framework that performs retrospective experience replay to inject distilled, abstract experience at inference time, and hierarchical frontier selection to decompose frontier ranking into coarse-to-fine decisions. Our approach enables robust, traceable, and efficient exploration. Across multiple embodied exploration benchmarks, ReEXplore yields great improvements over strong MLLM baselines, up to 3x higher performance in both success rate and in navigation efficiency under open-source backbones.

Problem

Research questions and friction points this paper is trying to address.

MLLMs rely on stale pre-trained knowledge for embodied exploration

Training-based approaches are expensive for long-horizon sparse reward tasks

Frontier-based exploration creates large action spaces difficult for MLLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework using retrospective experience replay

Hierarchical frontier selection for coarse-to-fine decisions

Injecting distilled abstract experience during inference time

🔎 Similar Papers

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents