AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing language model (LM) agents lack systematic integration with reinforcement learning (RL), particularly in multi-turn interactive settings where scalability and algorithmic flexibility remain challenging to reconcile. This paper introduces AgentFly, a modular RL training framework tailored for LM-based agents. Our approach addresses these challenges through three key innovations: (1) a decorator-based unified interface enabling plug-and-play integration of tools, reward functions, and RL algorithms; (2) a token-level masking mechanism that supports fine-grained action-space control and efficient adaptation of standard RL methods to LM-generated outputs; and (3) asynchronous execution coupled with centralized resource management, ensuring stable, high-throughput training under multi-turn interaction. Experiments demonstrate that AgentFly significantly enhances agent autonomy in complex decision-making tasks while maintaining strong scalability and generalization across diverse environments and RL algorithms.

Technology Category

Application Category

📝 Abstract
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised finetuning. At the same time, reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality. However, the combination of the LM agents and reinforcement learning (Agent-RL) remains underexplored and lacks systematic study. To this end, we built AgentFly, a scalable and extensible Agent-RL framework designed to empower LM agents with a variety of RL algorithms. Our framework supports multi-turn interactions by adapting traditional RL methods with token-level masking. It features a decorator-based interface for defining tools and reward functions, enabling seamless extension and ease of use. To support high-throughput training, we implement asynchronous execution of tool calls and reward computations, and design a centralized resource management system for scalable environment coordination. We also provide a suite of prebuilt tools and environments, demonstrating the framework's effectiveness through successful agent training across multiple tasks.
Problem

Research questions and friction points this paper is trying to address.

Combining LM agents with reinforcement learning lacks systematic study
Need scalable framework for multi-turn Agent-RL interactions
Require high-throughput training for tool calls and rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Agent-RL framework with diverse RL algorithms
Token-level masking for multi-turn interactions
Decorator-based interface for tool and reward definition
🔎 Similar Papers
No similar papers found.
Renxi Wang
Renxi Wang
MBZUAI
Natural Language Processing
R
Rifo Ahmad Genadi
Mohamed bin Zayed University of Artificial Intelligence
B
Bilal El Bouardi
Mohamed bin Zayed University of Artificial Intelligence
Y
Yongxin Wang
Mohamed bin Zayed University of Artificial Intelligence
Fajri Koto
Fajri Koto
Assistant Professor (tenure-track), MBZUAI
Computational LinguisticsNatural Language ProcessingMultilingual NLPHuman-centered NLP
Zhengzhong Liu
Zhengzhong Liu
Institute of Foundation Models
Natural Language ProcessingMachine Learning
Timothy Baldwin
Timothy Baldwin
MBZUAI and The University of Melbourne
computational linguisticsnatural language processingartificial intelligence
H
Haonan Li
Mohamed bin Zayed University of Artificial Intelligence