The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Large language models (LLMs) remain largely passive sequence generators, lacking autonomous decision-making capabilities essential for intelligent agenthood. Method: We propose “Agentic Reinforcement Learning” (Agentic RL), a novel paradigm grounded in long-horizon, partially observable Markov decision processes (POMDPs)—departing from conventional single-step MDP formulations. Our framework systematically integrates tool invocation, external memory, multi-step reasoning, and environment interaction. Through a comprehensive analysis of over 500 state-of-the-art works, we establish a dual taxonomy: one characterizing core agent capabilities (planning, perception, action, adaptation), and another mapping their applications across scientific discovery, software engineering, and embodied interaction. Contribution/Results: This work formally delineates the theoretical boundaries and capability landscape of Agentic RL for the first time, providing foundational principles and an open-sourced resource compendium to advance research on general-purpose AI agents.

Technology Category

Application Category

📝 Abstract

The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.

Problem

Research questions and friction points this paper is trying to address.

Transitioning LLMs from passive generators to autonomous agents

Formalizing Agentic RL through extended POMDP frameworks

Developing adaptive agent capabilities via reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic RL uses extended POMDPs for autonomy

Taxonomy organizes agentic capabilities and applications

Reinforcement learning adapts static modules into behavior

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents