Can Tool-augmented Large Language Models be Aware of Incomplete Conditions?

📅 2024-06-18
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the conditional awareness of tool-augmented large language models (LLMs) under incomplete user inputs or missing prerequisite tools. Addressing the prevalent issue of blind tool invocation—which severely compromises reliability—we formally define and systematically evaluate LLMs’ ability to recognize preconditions for tool usage, specifically information and tool completeness. We introduce the first benchmark dedicated to “incomplete condition” recognition. Our methodology features a controllable data synthesis approach to generate diverse incomplete scenarios and a black-box evaluation framework grounded in prompt engineering and behavioral analysis; we adapt the ToolAlpaca and API-Bank datasets and conduct human-annotated comparative experiments. Results reveal that state-of-the-art LLMs achieve less than 40% precondition recognition accuracy—substantially below human performance (>92%)—exposing a critical robustness bottleneck in current tool-augmented reasoning systems.

Technology Category

Application Category

📝 Abstract
Recent advancements in integrating large language models (LLMs) with tools have allowed the models to interact with real-world environments. However, these tool-augmented LLMs often encounter incomplete scenarios when users provide partial information or the necessary tools are unavailable. Recognizing and managing such scenarios is crucial for LLMs to ensure their reliability, but this exploration remains understudied. This study examines whether LLMs can identify incomplete conditions and appropriately determine when to refrain from using tools. To this end, we address a dataset by manipulating instances from two datasets by removing necessary tools or essential information for tool invocation. Our experiments show that LLMs often struggle to identify the absence of information required to utilize specific tools and recognize the absence of appropriate tools. We further analyze model behaviors in different environments and compare their performance against humans. Our research can contribute to advancing reliable LLMs by addressing common scenarios during interactions between humans and LLMs. Our code and dataset will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Detect incomplete conditions in tool-augmented LLMs
Determine when to avoid tool use due to missing information
Assess LLM reliability in partial information scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects incomplete conditions in tool-augmented LLMs
Analyzes model behavior in varied environments
Compares LLM performance with human responses
🔎 Similar Papers
No similar papers found.