TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

📅 2024-12-20
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Standard supervised fine-tuning (SFT) overlooks task-specificity in tool invocation, limiting large language models’ (LLMs) tool-use capabilities. To address this, we propose a task-aware lightweight training framework: (i) we first identify and characterize the suppressive effect of training data on desired tool behaviors; (ii) we design a dynamic token-weighting mechanism to emphasize critical decision steps during tool selection and argument generation; and (iii) we introduce an error-category-aware reward function optimized via Proximal Policy Optimization (PPO). Applying our method to CodeLlama-2-7B with only 1,217 training samples, we achieve state-of-the-art performance across four major open-source tool-augmented benchmarks—outperforming leading open-source models and several closed-source counterparts. Moreover, our approach significantly improves robustness to input noise and cross-task generalization capability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) achieve remarkable advancements by leveraging tools to interact with external environments, a critical step toward generalized AI. However, the standard supervised fine-tuning (SFT) approach, which relies on large-scale datasets, often overlooks task-specific characteristics in tool use, leading to performance bottlenecks. To address this issue, we analyze three existing LLMs and uncover key insights: training data can inadvertently impede tool-use behavior, token importance is distributed unevenly, and errors in tool calls fall into a small set of distinct categories. Building on these findings, we propose TL-Training, a task-feature-based framework that mitigates the effects of suboptimal training data, dynamically adjusts token weights to prioritize key tokens during SFT, and incorporates a robust reward mechanism tailored to error categories, optimized through proximal policy optimization. We validate TL-Training by training CodeLLaMA-2-7B and evaluating it on four diverse open-source test sets. Our results demonstrate that the LLM trained by our method matches or surpasses both open- and closed-source LLMs in tool-use performance using only 1,217 training data points. Additionally, our method enhances robustness in noisy environments and improves general task performance, offering a scalable and efficient paradigm for tool-use training in LLMs. The code and data are available at https://github.com/Junjie-Ye/TL-Training.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottlenecks in tool-use training for LLMs
Mitigates suboptimal training data effects on tool-use behavior
Improves robustness and efficiency in LLM tool-use training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task-feature-based framework mitigates suboptimal training data
Dynamic token weighting prioritizes key tokens during SFT
Robust reward mechanism tailored to error categories via PPO
🔎 Similar Papers
No similar papers found.