Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

📅 2024-06-11

🏛️ Neural Information Processing Systems

📈 Citations: 5

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing tool-augmented large language models (e.g., ToolLLaMA) employ supervised fine-tuning (SFT) solely on successful decision-tree paths during multi-step reasoning, neglecting failure paths that contain valuable error-correction signals—thus limiting the learning space. Method: We propose TP-LLaMA, the first framework to systematically mine failed reasoning paths in decision trees and construct step-level preference data. It introduces a two-stage training paradigm: SFT followed by Direct Preference Optimization (DPO), explicitly modeling erroneous experiences and refining policy behavior. Decision trees are built via depth-first search, and fine-grained preference modeling enables precise policy correction. Contribution/Results: TP-LLaMA achieves significant improvements over baselines across multiple benchmarks, demonstrates stronger generalization to unseen APIs, and attains higher inference efficiency—validating the efficacy of leveraging failure signals for robust tool reasoning.

Technology Category

Application Category

📝 Abstract

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to improve their reasoning capabilities on complex tasks. This enables them to act as intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2023] utilizes the depth-first search-based decision tree (DFSDT) mechanism for multi-step reasoning with $16000+$ real-world APIs, effectively enhancing the performance of tool-augmented LLMs compared to traditional chain reasoning mechanisms. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT), missing out on the potential learning opportunities from failed paths. Inspired by this, we propose an inference trajectory optimization framework based on preference learning to address this limitation. We first introduce a novel method for constructing step-wise preference data from tree-like expert trajectories, which leverages the previously ignored failed explorations in the decision trees. In the subsequent training phase, we first fine-tune the LLM with successful tool-usage expert trajectories and then apply direct preference optimization (DPO) with the preference data to update the LLM's policy, resulting in our ToolPrefer-LLaMA (TP-LLaMA) model. This approach not only enhances the utilization of original expert data but also broadens the learning space of the model. Our experiments demonstrate that by obtaining insights from errors in inference trees, TP-LLaMA significantly outperforms the baselines across almost all test scenarios by a large margin and exhibits better generalization capabilities with unseen APIs. At the same time, TP-LLaMA has also demonstrated superior reasoning efficiency compared to the baselines, making it more suitable for complex tool-usage reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Enhancing tool-augmented LLMs by learning from failed inference paths

Optimizing decision trees with preference learning for better reasoning

Improving generalization and efficiency in complex tool-usage tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DFSDT for multi-step API reasoning

Optimizes inference via step-wise preference learning

Enhances model with failed path insights

🔎 Similar Papers

Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?