ToolBrain: A Flexible Reinforcement Learning Framework for Agentic Tools

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in AI agent tool usage—including manual reward design, scarce training data, and weak multi-tool coordination—this paper proposes a lightweight reinforcement learning framework. The framework supports customizable reward functions and LLM-based automated scoring, integrating knowledge distillation, automated task generation, and seamless tool retrieval; it further incorporates GRPO, DPO, and supervised fine-tuning. Efficient parameter optimization is achieved via QLoRA and Unsloth, complemented by bitsandbytes for quantized inference. Evaluated on the CodeAct agent, the framework improves tool-call accuracy by 30.0%, accelerates training convergence, reduces GPU memory consumption by 40%, and delivers concise, modular, and highly extensible code—significantly lowering cross-domain adaptation costs.

Technology Category

Application Category

📝 Abstract
Effective tool use is essential for agentic AI, yet training agents to utilize tools remains challenging due to manually designed rewards, limited training data, and poor multi-tool selection, resulting in slow adaptation, wasted computational resources, and suboptimal performance. We introduce ToolBrain, a lightweight and user-friendly framework for coaching tool use in agentic models with flexible reinforcement learning (RL), easing the barriers for researchers and practitioners to adapt LLM-based agents to specific domains. It supports a wide range of training strategies, including RL algorithms such as GRPO and DPO, as well as supervised learning. ToolBrain enables custom reward callables directly on an agent's execution traces or simply utilizes an automated LLM-as-a-judge system for reward generation. It is packed with useful capabilities, including knowledge distillation from large to small models for efficient development, automatic task generation from tool descriptions, seamless tool retrieval, efficient fine-tuning pipelines with QLoRA through Unsloth, and quantized inference via bitsandbytes. We demonstrate ToolBrain through diverse use cases, such as training a CodeAct agent to autonomously execute email search tasks, showing fast, targeted improvements (up to 30.0%) in tool-use skills while keeping the codebase simple and extensible in Agentic AI. Our framework is publicly available at https://toolbrain.org.
Problem

Research questions and friction points this paper is trying to address.

Training AI agents to effectively use tools faces multiple challenges
Existing methods struggle with reward design and multi-tool selection
Current approaches result in slow adaptation and suboptimal performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flexible reinforcement learning framework for agentic tools
Automated LLM-as-a-judge system for reward generation
Efficient fine-tuning pipelines with QLoRA and Unsloth
Q
Quy Minh Le
ToolBrain Research, Ireland
M
Minh Sao Khue Luu
ToolBrain Research, Ireland
Khanh-Tung Tran
Khanh-Tung Tran
PhD Student, University College Cork
Large Language ModelLow-resource ScenarioMulti-agent
D
Duc-Hai Nguyen
University College Cork, Ireland
H
Hoang-Quoc-Viet Pham
ToolBrain Research, Ireland
Q
Quan Le
CeADAR University College Dublin, Ireland
Hoang Thanh Lam
Hoang Thanh Lam
Research staff, IBM research, Dublin, Ireland
Data mining and machine learning
Hoang D. Nguyen
Hoang D. Nguyen
Associate Professor, University College Cork, National University of Ireland, Cork, Ireland
Reliable Machine LearningAgentic AIDecision OptimizationSDGs