ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality tool-use trajectory data is scarce, hindering large language models’ reliable invocation of external tools in complex tasks. Existing approaches predominantly rely on multi-turn dialogue-level correctness verification, which fails to effectively suppress error propagation at the turn level. This paper introduces ToolMind, a novel framework that models tool semantics via a function graph and simulates realistic interactions through a tri-agent collaboration—comprising user, assistant, and tool agents—to generate high-fidelity, multi-turn trajectories. It further proposes a turn-level fine-grained filtering mechanism that actively identifies and removes erroneous steps during training while preserving self-correcting reasoning signals. Combined with parameter correlation analysis and synthetic data augmentation, ToolMind significantly enhances trajectory quality. Empirical evaluation across multiple benchmarks demonstrates that ToolMind-finetuned models substantially outperform state-of-the-art baselines in both tool-call accuracy and multi-step reasoning capability.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still hinders the development of stronger LLM agents. Most existing works on multi-turn dialogue synthesis validate correctness only at the trajectory level, which may overlook turn-level errors that can propagate during training and degrade model performance. To address these limitations, we introduce ToolMind, a large-scale, high-quality tool-agentic dataset with 160k synthetic data instances generated using over 20k tools and 200k augmented open-source data instances. Our data synthesis pipeline first constructs a function graph based on parameter correlations and then uses a multi-agent framework to simulate realistic user-assistant-tool interactions. Beyond trajectory-level validation, we employ fine-grained turn-level filtering to remove erroneous or suboptimal steps, ensuring that only high-quality reasoning traces are retained. This approach mitigates error amplification during training while preserving self-corrective reasoning signals essential for robust tool-use learning. Models fine-tuned on ToolMind show significant improvements over baselines on several benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Scarcity of high-quality trajectories hinders development of stronger LLM agents
Existing works overlook turn-level errors that propagate during training
Need for large-scale tool-use dataset with fine-grained quality validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale tool-agentic dataset with synthetic instances
Multi-agent framework simulating user-assistant-tool interactions
Fine-grained turn-level filtering for high-quality reasoning traces
🔎 Similar Papers
No similar papers found.
C
Chen Yang
Nanbeige Lab, BOSS Zhipin
R
Ran Le
Nanbeige Lab, BOSS Zhipin
Yun Xing
Yun Xing
School of Computer Science and Engineering, Nanyang Technological University
Computer Vision
Z
Zhenwei An
Nanbeige Lab, BOSS Zhipin
Z
Zongchao Chen
Nanbeige Lab, BOSS Zhipin
Wayne Xin Zhao
Wayne Xin Zhao
Professor, Renmin University of China
Recommender SystemNatural Language ProcessingLarge Language Model
Y
Yang Song
Nanbeige Lab, BOSS Zhipin
T
Tao Zhang
Nanbeige Lab, BOSS Zhipin