RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes RetroAgent, a novel framework that addresses the tendency of conventional reinforcement learning agents to converge to suboptimal policies in complex interactive tasks due to insufficient exploration and the absence of explicit experience reuse mechanisms. RetroAgent introduces a retrospective self-reflection mechanism that generates dual forms of intrinsic feedback—numerical and linguistic. The numerical feedback drives efficient exploration, while the linguistic feedback distills reusable experiences into a language-based memory buffer. A similarity- and utility-aware SimUtil-UCB strategy dynamically retrieves relevant past experiences during decision-making. By explicitly externalizing and iteratively refining experiential knowledge, RetroAgent achieves state-of-the-art performance, yielding absolute improvements of 18.3%, 15.4%, 27.1%, and 8.9% on ALFWorld, WebShop, Sokoban, and MineSweeper, respectively, and demonstrates strong test-time adaptation and out-of-distribution generalization capabilities.

Technology Category

Application Category

📝 Abstract
Large language model (LLM)-based agents trained with reinforcement learning (RL) have shown strong potential on complex interactive tasks. However, standard RL paradigms favor static problem-solving over continuous adaptation: agents often converge to suboptimal strategies due to insufficient exploration, while learned knowledge remains implicit within parameters rather than explicitly retrievable, limiting effective experiential learning. To address these limitations, we introduce RetroAgent, an online RL framework that empowers agents to master complex interactive environments not just by solving, but by evolving. Concretely, RetroAgent features a hindsight self-reflection mechanism that produces dual intrinsic feedback: (1) intrinsic numerical feedback that that tracks incremental subtask completion relative to prior attempts, rewarding promising explorations, and (2) intrinsic language feedback that distills reusable lessons into a memory buffer, retrieved via our proposed Similarity&Utility-Aware Upper Confidence Bound (SimUtil-UCB) strategy balancing relevance, utility, and exploration to effectively leverage past experiences. Extensive experiments on two model families across four challenging agentic tasks demonstrate that RetroAgent significantly outperforms existing methods, achieving state-of-the-art results -- e.g., surpassing Group Relative Policy Optimization (GRPO)-trained agents by +18.3% on ALFWorld, +15.4% on WebShop, +27.1% on Sokoban, and +8.9% on MineSweeper -- while exhibiting strong test-time adaptation and generalization to out-of-distribution scenarios.
Problem

Research questions and friction points this paper is trying to address.

reinforcement learning
large language model
exploration
experiential learning
adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

RetroAgent
hindsight self-reflection
intrinsic feedback
SimUtil-UCB
experiential learning
🔎 Similar Papers
No similar papers found.