🤖 AI Summary
Existing language model (LM) agents lack systematic integration with reinforcement learning (RL), particularly in multi-turn interactive settings where scalability and algorithmic flexibility remain challenging to reconcile. This paper introduces AgentFly, a modular RL training framework tailored for LM-based agents. Our approach addresses these challenges through three key innovations: (1) a decorator-based unified interface enabling plug-and-play integration of tools, reward functions, and RL algorithms; (2) a token-level masking mechanism that supports fine-grained action-space control and efficient adaptation of standard RL methods to LM-generated outputs; and (3) asynchronous execution coupled with centralized resource management, ensuring stable, high-throughput training under multi-turn interaction. Experiments demonstrate that AgentFly significantly enhances agent autonomy in complex decision-making tasks while maintaining strong scalability and generalization across diverse environments and RL algorithms.
📝 Abstract
Language model (LM) agents have gained significant attention for their ability to autonomously complete tasks through interactions with environments, tools, and APIs. LM agents are primarily built with prompt engineering or supervised finetuning. At the same time, reinforcement learning (RL) has been explored to enhance LM's capabilities, such as reasoning and factuality. However, the combination of the LM agents and reinforcement learning (Agent-RL) remains underexplored and lacks systematic study. To this end, we built AgentFly, a scalable and extensible Agent-RL framework designed to empower LM agents with a variety of RL algorithms. Our framework supports multi-turn interactions by adapting traditional RL methods with token-level masking. It features a decorator-based interface for defining tools and reward functions, enabling seamless extension and ease of use. To support high-throughput training, we implement asynchronous execution of tool calls and reward computations, and design a centralized resource management system for scalable environment coordination. We also provide a suite of prebuilt tools and environments, demonstrating the framework's effectiveness through successful agent training across multiple tasks.