RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work proposes RetroAgent, a novel framework that addresses the tendency of conventional reinforcement learning agents to converge to suboptimal policies in complex interactive tasks due to insufficient exploration and the absence of explicit experience reuse mechanisms. RetroAgent introduces a retrospective self-reflection mechanism that generates dual forms of intrinsic feedback—numerical and linguistic. The numerical feedback drives efficient exploration, while the linguistic feedback distills reusable experiences into a language-based memory buffer. A similarity- and utility-aware SimUtil-UCB strategy dynamically retrieves relevant past experiences during decision-making. By explicitly externalizing and iteratively refining experiential knowledge, RetroAgent achieves state-of-the-art performance, yielding absolute improvements of 18.3%, 15.4%, 27.1%, and 8.9% on ALFWorld, WebShop, Sokoban, and MineSweeper, respectively, and demonstrates strong test-time adaptation and out-of-distribution generalization capabilities.

Technology Category

Application Category

📝 Abstract

Large language model (LLM)-based agents trained with reinforcement learning (RL) have shown strong potential on complex interactive tasks. However, standard RL paradigms favor static problem-solving over continuous adaptation: agents often converge to suboptimal strategies due to insufficient exploration, while learned knowledge remains implicit within parameters rather than explicitly retrievable, limiting effective experiential learning. To address these limitations, we introduce RetroAgent, an online RL framework that empowers agents to master complex interactive environments not just by solving, but by evolving. Concretely, RetroAgent features a hindsight self-reflection mechanism that produces dual intrinsic feedback: (1) intrinsic numerical feedback that that tracks incremental subtask completion relative to prior attempts, rewarding promising explorations, and (2) intrinsic language feedback that distills reusable lessons into a memory buffer, retrieved via our proposed Similarity&Utility-Aware Upper Confidence Bound (SimUtil-UCB) strategy balancing relevance, utility, and exploration to effectively leverage past experiences. Extensive experiments on two model families across four challenging agentic tasks demonstrate that RetroAgent significantly outperforms existing methods, achieving state-of-the-art results -- e.g., surpassing Group Relative Policy Optimization (GRPO)-trained agents by +18.3% on ALFWorld, +15.4% on WebShop, +27.1% on Sokoban, and +8.9% on MineSweeper -- while exhibiting strong test-time adaptation and generalization to out-of-distribution scenarios.

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

large language model

exploration

experiential learning

adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

RetroAgent

hindsight self-reflection

intrinsic feedback