AgentFly: Extensible and Scalable Reinforcement Learning for LM Agents

📅 2025-07-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing language model (LM) agents lack systematic integration with reinforcement learning (RL), particularly in multi-turn interactive settings where scalability and algorithmic flexibility remain challenging to reconcile. This paper introduces AgentFly, a modular RL training framework tailored for LM-based agents. Our approach addresses these challenges through three key innovations: (1) a decorator-based unified interface enabling plug-and-play integration of tools, reward functions, and RL algorithms; (2) a token-level masking mechanism that supports fine-grained action-space control and efficient adaptation of standard RL methods to LM-generated outputs; and (3) asynchronous execution coupled with centralized resource management, ensuring stable, high-throughput training under multi-turn interaction. Experiments demonstrate that AgentFly significantly enhances agent autonomy in complex decision-making tasks while maintaining strong scalability and generalization across diverse environments and RL algorithms.

Technology Category

Application Category

📝 Abstract

Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised finetuning. At the same time, reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality. However, the combination of the LM agents and reinforcement learning (Agent-RL) remains underexplored and lacks systematic study. To this end, we built AgentFly, a scalable and extensible Agent-RL framework designed to empower LM agents with a variety of RL algorithms. Our framework supports multi-turn interactions by adapting traditional RL methods with token-level masking. It features a decorator-based interface for defining tools and reward functions, enabling seamless extension and ease of use. To support high-throughput training, we implement asynchronous execution of tool calls and reward computations, and design a centralized resource management system for scalable environment coordination. We also provide a suite of prebuilt tools and environments, demonstrating the framework's effectiveness through successful agent training across multiple tasks.

Problem

Research questions and friction points this paper is trying to address.

Combining LM agents with reinforcement learning lacks systematic study

Need scalable framework for multi-turn Agent-RL interactions

Require high-throughput training for tool calls and rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Scalable Agent-RL framework with diverse RL algorithms

Token-level masking for multi-turn interactions

Decorator-based interface for tool and reward definition

🔎 Similar Papers

MegaAgent: A Large-Scale Autonomous LLM-based Multi-Agent System Without Predefined SOPs