🤖 AI Summary
To address the limitations of large language models (LLMs) in tool invocation—namely, restricted tool-calling capability and low-quality, semantically impoverished instruction tuning data—this paper proposes a knowledge graph–based method for generating high-quality instruction data. For the first time, it leverages a manually constructed knowledge graph to automatically extract semantically coherent query paths, map them to tool-call sequences, and parse them into structured operational steps, thereby synthesizing semantically rich, logically clear instructional prompts. The approach achieves effective supervised fine-tuning with only a small amount of synthetic data, significantly improving LLMs’ tool selection accuracy and task completion rates across multiple tool-learning benchmarks. Its core contribution is establishing an interpretable, knowledge graph–to–tool-instruction generation paradigm that unifies high data efficiency with strong generalization capability.
📝 Abstract
Teaching large language models (LLMs) to use tools is crucial for improving their problem-solving abilities and expanding their applications. However, effectively using tools is challenging because it requires a deep understanding of tool functionalities and user intentions. Previous methods relied mainly on LLMs to generate instruction data, but the quality of these data was often insufficient. In this paper, we propose a new method that uses knowledge graphs to generate high-quality instruction data for LLMs. Knowledge graphs are manually curated datasets rich in semantic information. We begin by extracting various query pathways from a given knowledge graph, which are transformed into a broad spectrum of user queries. We then translate the relationships between entities into actionable tools and parse the pathways of each query into detailed solution steps, thereby creating high-quality instruction data. Our experiments show that fine-tuning on just a small sample of this synthetic data can significantly improve the tool utilization and overall capabilities of LLMs.