🤖 AI Summary
Existing tool learning approaches rely on real API calls, incurring high computational costs, exhibiting poor generalization, and lacking multi-hop reasoning and self-reflection capabilities.
Method: We propose the first real-API-call-free framework for synthesizing multi-hop search tool learning data. Given (question, gold context, answer) triplets, it automatically generates high-quality, diverse training data via lightweight virtual tool modeling. Our method innovatively integrates multi-hop reasoning chain generation with self-reflection enhancement, and establishes a multi-layer verification system—combining rule-based and model-based checks—to ensure data fidelity.
Contribution/Results: Experiments demonstrate that an 8B-parameter model trained on our synthetic data surpasses GPT-4o across multiple benchmarks. To foster reproducibility and community advancement, we publicly release both the code and the dataset.
📝 Abstract
Training LLMs to invoke tools and leverage retrieved information necessitates high-quality, diverse data. However, existing pipelines for synthetic data generation often rely on tens of thousands of real API calls to enhance generalization, incurring prohibitive costs while lacking multi-hop reasoning and self-reflection. To address these limitations, we introduce ToolForge, an automated synthesis framework that achieves strong real-world tool-calling performance by constructing only a small number of virtual tools, eliminating the need for real API calls. ToolForge leverages a (question, golden context, answer) triple to synthesize large-scale tool-learning data specifically designed for multi-hop search scenarios, further enriching the generated data through multi-hop reasoning and self-reflection mechanisms. To ensure data fidelity, we employ a Multi-Layer Validation Framework that integrates both rule-based and model-based assessments. Empirical results show that a model with only 8B parameters, when trained on our synthesized data, outperforms GPT-4o on multiple benchmarks. Our code and dataset are publicly available at https://github.com/Buycar-arb/ToolForge .