🤖 AI Summary
To address challenges in AI agent tool usage—including manual reward design, scarce training data, and weak multi-tool coordination—this paper proposes a lightweight reinforcement learning framework. The framework supports customizable reward functions and LLM-based automated scoring, integrating knowledge distillation, automated task generation, and seamless tool retrieval; it further incorporates GRPO, DPO, and supervised fine-tuning. Efficient parameter optimization is achieved via QLoRA and Unsloth, complemented by bitsandbytes for quantized inference. Evaluated on the CodeAct agent, the framework improves tool-call accuracy by 30.0%, accelerates training convergence, reduces GPU memory consumption by 40%, and delivers concise, modular, and highly extensible code—significantly lowering cross-domain adaptation costs.
📝 Abstract
Effective tool use is essential for agentic AI, yet training agents to utilize tools remains challenging due to manually designed rewards, limited training data, and poor multi-tool selection, resulting in slow adaptation, wasted computational resources, and suboptimal performance. We introduce ToolBrain, a lightweight and user-friendly framework for coaching tool use in agentic models with flexible reinforcement learning (RL), easing the barriers for researchers and practitioners to adapt LLM-based agents to specific domains. It supports a wide range of training strategies, including RL algorithms such as GRPO and DPO, as well as supervised learning. ToolBrain enables custom reward callables directly on an agent's execution traces or simply utilizes an automated LLM-as-a-judge system for reward generation. It is packed with useful capabilities, including knowledge distillation from large to small models for efficient development, automatic task generation from tool descriptions, seamless tool retrieval, efficient fine-tuning pipelines with QLoRA through Unsloth, and quantized inference via bitsandbytes. We demonstrate ToolBrain through diverse use cases, such as training a CodeAct agent to autonomously execute email search tasks, showing fast, targeted improvements (up to 30.0%) in tool-use skills while keeping the codebase simple and extensible in Agentic AI. Our framework is publicly available at https://toolbrain.org.