Look Before You Leap: Autonomous Exploration for LLM Agents

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the tendency of large language model agents to prematurely rely on prior knowledge in unfamiliar environments, leading to inadequate exploration and task failure. The authors propose an Explore-then-Act paradigm that decouples exploration from execution: agents first systematically gather environmental information within a fixed interaction budget, then leverage the acquired embodied knowledge to accomplish tasks. The study formally characterizes an agent’s autonomous exploration capability, introduces a verifiable exploration coverage metric, and devises an alternating training strategy that interleaves exploration and task objectives. Furthermore, a dual-trajectory reinforcement learning framework with a verifiable reward mechanism is introduced to optimize behavior policies. This approach substantially enhances generalization in unseen environments, overcoming the limitations of conventional methods whose narrow behavioral repertoires constrain downstream task performance.

📝 Abstract

Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptive agents. To formalize and quantify this capability, we introduce Exploration Checkpoint Coverage, a verifiable metric that measures how broadly an agent discovers key states, objects, and affordances. Our systematic evaluation reveals that agents trained with standard task-oriented reinforcement learning consistently exhibit narrow and repetitive behaviors that impede downstream performance. To address this limitation, we develop a training strategy that interleaves task-execution rollouts and exploration rollouts, with each type of rollout optimized by its corresponding verifiable reward. Building on this training strategy, we propose the Explore-then-Act paradigm, which decouples information-gathering from task execution: agents first utilize an interaction budget to acquire grounded environmental knowledge, then leverage it for task resolution. Our results demonstrate that learning to systematically explore is imperative for building generalizable and real-world-ready agents.

Problem

Research questions and friction points this paper is trying to address.

autonomous exploration

premature exploitation

LLM agents

environmental knowledge

adaptive agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

autonomous exploration

Explore-then-Act

Exploration Checkpoint Coverage