Agent Learning via Early Experience

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing language agents rely heavily on expert demonstrations for supervised fine-tuning, resulting in poor generalization and limited scalability. To address this, we propose the “Early Experience” learning paradigm—a self-supervised framework enabling agents to learn policies solely from sequences of states generated through iterative environmental interaction, without explicit rewards or expert data. Our approach integrates implicit world modeling with a self-reflection mechanism, facilitating cross-task continual improvement across eight heterogeneous real-world environments. Experiments demonstrate substantial gains in task performance and out-of-domain generalization. Moreover, in settings with verifiable reward signals, the paradigm exhibits smooth convergence toward reinforcement learning objectives. This work establishes a novel pathway toward expert-free, autonomously evolving language agents—advancing beyond demonstration-dependent paradigms toward intrinsic, experience-driven policy acquisition.

Technology Category

Application Category

📝 Abstract
A long-term goal of language agents is to learn and improve through their own experience, ultimately outperforming humans in complex, real-world tasks. However, training agents from experience data with reinforcement learning remains difficult in many environments, which either lack verifiable rewards (e.g., websites) or require inefficient long-horizon rollouts (e.g., multi-turn tool use). As a result, most current agents rely on supervised fine-tuning on expert data, which is challenging to scale and generalizes poorly. This limitation stems from the nature of expert demonstrations: they capture only a narrow range of scenarios and expose the agent to limited environment diversity. We address this limitation with a middle-ground paradigm we call early experience: interaction data generated by the agent's own actions, where the resulting future states serve as supervision without reward signals. Within this paradigm we study two strategies of using such data: (1) Implicit world modeling, which uses collected states to ground the policy in environment dynamics; and (2) Self-reflection, where the agent learns from its suboptimal actions to improve reasoning and decision-making. We evaluate across eight diverse environments and multiple model families. Our approaches consistently improve effectiveness and out-of-domain generalization, highlighting the value of early experience. Moreover, in environments with verifiable rewards, our results provide promising signals that early experience offers a strong foundation for subsequent reinforcement learning, positioning it as a practical bridge between imitation learning and fully experience-driven agents.
Problem

Research questions and friction points this paper is trying to address.

Training language agents through experience without reward signals in diverse environments
Overcoming limitations of supervised fine-tuning using agent-generated interaction data
Improving agent generalization via early experience strategies like world modeling and self-reflection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Early experience paradigm uses agent's own interaction data
Implicit world modeling grounds policy in environment dynamics
Self-reflection learns from suboptimal actions to improve reasoning
K
Kai Zhang
Meta Superintelligence Labs
X
Xiangchao Chen
The Ohio State University
B
Bo Liu
FAIR at Meta
Tianci Xue
Tianci Xue
The Ohio State University
NLP
Zeyi Liao
Zeyi Liao
The Ohio State University
AINLPMultimodalAgent
Zhihan Liu
Zhihan Liu
Northwestern University
large language modelsreinforcement learningoffline learningonline learning
Xiyao Wang
Xiyao Wang
Ph.D. in University of Maryland, College Park
World ModelEmbodied AIMultimodel LLM
Yuting Ning
Yuting Ning
The Ohio State University
Natural Language Processing
Zhaorun Chen
Zhaorun Chen
Ph.D. Student, UChicago CS
AI SafetyLLM AgentReinforcement Learning
X
Xiaohan Fu
Meta Superintelligence Labs
J
Jian Xie
The Ohio State University
Y
Yuxuan Sun
The Ohio State University
Boyu Gou
Boyu Gou
The Ohio State University
Artificial IntelligenceLanguage AgentsGUI Agents
Q
Qi Qi
Meta Superintelligence Labs
Zihang Meng
Zihang Meng
Meta Superintelligence Labs
Jianwei Yang
Jianwei Yang
Research Scientist, Meta SuperIntelligence Lab
Multimodal Agentic AI
N
Ning Zhang
Meta Superintelligence Labs
X
Xian Li
FAIR at Meta
Ashish Shah
Ashish Shah
Meta Superintelligence Labs
D
Dat Huynh
Meta Superintelligence Labs
Hengduo Li
Hengduo Li
Meta Superintelligence Labs
Z
Zi Yang
Meta Superintelligence Labs
S
Sara Cao
Meta Superintelligence Labs
L
Lawrence Jang
Meta Superintelligence Labs
Shuyan Zhou
Shuyan Zhou
Duke University
Large Language ModelsAI Agent
Jiacheng Zhu
Jiacheng Zhu
MIT
Machine LearningFoundation ModelsOptimal TransportBayesian modeling
Huan Sun
Huan Sun
Endowed CoE Innovation Scholar and Associate Professor, The Ohio State University
AgentsLarge Language ModelsNatural Language ProcessingAI
Jason Weston
Jason Weston
Meta
Artificial IntelligenceMachine LearningBioinformaticsVisionNatural Language Processing
Y
Yu Su
The Ohio State University
Y
Yifan Wu
Meta Superintelligence Labs