The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) remain largely passive sequence generators, lacking autonomous decision-making capabilities essential for intelligent agenthood. Method: We propose “Agentic Reinforcement Learning” (Agentic RL), a novel paradigm grounded in long-horizon, partially observable Markov decision processes (POMDPs)—departing from conventional single-step MDP formulations. Our framework systematically integrates tool invocation, external memory, multi-step reasoning, and environment interaction. Through a comprehensive analysis of over 500 state-of-the-art works, we establish a dual taxonomy: one characterizing core agent capabilities (planning, perception, action, adaptation), and another mapping their applications across scientific discovery, software engineering, and embodied interaction. Contribution/Results: This work formally delineates the theoretical boundaries and capability landscape of Agentic RL for the first time, providing foundational principles and an open-sourced resource compendium to advance research on general-purpose AI agents.

Technology Category

Application Category

📝 Abstract
The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from passive sequence generators into autonomous, decision-making agents embedded in complex, dynamic worlds. This survey formalizes this conceptual shift by contrasting the degenerate single-step Markov Decision Processes (MDPs) of LLM-RL with the temporally extended, partially observable Markov decision processes (POMDPs) that define Agentic RL. Building on this foundation, we propose a comprehensive twofold taxonomy: one organized around core agentic capabilities, including planning, tool use, memory, reasoning, self-improvement, and perception, and the other around their applications across diverse task domains. Central to our thesis is that reinforcement learning serves as the critical mechanism for transforming these capabilities from static, heuristic modules into adaptive, robust agentic behavior. To support and accelerate future research, we consolidate the landscape of open-source environments, benchmarks, and frameworks into a practical compendium. By synthesizing over five hundred recent works, this survey charts the contours of this rapidly evolving field and highlights the opportunities and challenges that will shape the development of scalable, general-purpose AI agents.
Problem

Research questions and friction points this paper is trying to address.

Transitioning LLMs from passive generators to autonomous agents
Formalizing Agentic RL through extended POMDP frameworks
Developing adaptive agent capabilities via reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic RL uses extended POMDPs for autonomy
Taxonomy organizes agentic capabilities and applications
Reinforcement learning adapts static modules into behavior
🔎 Similar Papers
No similar papers found.
Guibin Zhang
Guibin Zhang
National University of Singapore
Multi-Agent SystemEfficient AI
Hejia Geng
Hejia Geng
Researcher @ Oxford
X
Xiaohang Yu
Imperial College London
Zhenfei Yin
Zhenfei Yin
University of Oxford
Deep LearningMultimodalAI AgentRobotics
Zaibin Zhang
Zaibin Zhang
Dalian University of Technology
LLMsMulti-agent SystemMulti-modal Agent
Z
Zelin Tan
University of Science and Technology of China
Heng Zhou
Heng Zhou
Jiangnan University
Multi-modal LearningImage ProcessingComputer VisionRemote Sensing
Zhongzhi Li
Zhongzhi Li
Institute of Automation, Chinese Academy of Sciences
LLMNLPMath Reason
Xiangyuan Xue
Xiangyuan Xue
The Chinese University of Hong Kong
LLM-based AgentMulti-agent SystemReinforcement Learning
Yijiang Li
Yijiang Li
Argonne National Laboratory
Y
Yifan Zhou
University of Georgia
Y
Yang Chen
Shanghai AI Laboratory
C
Chen Zhang
University of Science and Technology of China
Y
Yutao Fan
Shanghai AI Laboratory
Z
Zihu Wang
University of California, Santa Barbara
S
Songtao Huang
Fudan University, Shanghai AI Laboratory
Yue Liao
Yue Liao
National University of Singapore
Computer VisionDeep LearningMLLM
H
Hongru Wang
The Chinese University of Hong Kong
Mengyue Yang
Mengyue Yang
Lecturer, University of Bristol
CausalityTrustworthiness
Heng Ji
Heng Ji
Professor of Computer Science, AICE Director, ASKS Director, UIUC, Amazon Scholar
Natural Language ProcessingLarge Language Models
M
Michael Littman
Brown University
J
Jun Wang
University College London
S
Shuicheng Yan
National University of Singapore
Philip Torr
Philip Torr
Professor, University of Oxford
Department of Engineering
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery