Improving Large Language Models Function Calling and Interpretability via Guided-Structured Templates

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently fail in real-world tool invocation due to intent misinterpretation, incorrect parsing of tool documentation, and parameterization errors. To address this, we propose a curriculum-inspired structured reasoning framework that replaces free-form chain-of-thought prompting with guided, template-based reasoning—explicitly decoupling the process into three sequential stages: *intent parsing*, *tool matching*, and *parameter generation*. Our framework employs stepwise structured prompts to jointly model user goals and tool functionalities, thereby enhancing invocation robustness and decision interpretability. Evaluated across multiple state-of-the-art models (e.g., LLaMA-3, Qwen2) and benchmarks (ToolBench, API-Bank), it reduces relative error rates by 3–12% over strong baselines. The core contribution lies in transforming implicit, unstructured reasoning into an explicit, traceable, and modular pipeline—balancing accuracy with transparency and auditability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated strong reasoning and tool-use capabilities, yet they often fail in real-world tool-interactions due to incorrect parameterization, poor tool selection, or misinterpretation of user intent. These issues often stem from an incomplete understanding of user goals and inadequate comprehension of tool documentation. While Chain-of-Thought (CoT) prompting has proven effective for enhancing reasoning in general contexts, our analysis reveals that free-form CoT is insufficient and sometimes counterproductive for structured function-calling tasks. To address this, we introduce a curriculum-inspired framework that leverages structured reasoning templates to guide LLMs through more deliberate step-by-step instructions for generating function callings. Experimental results show that our method reduces tool-use errors, achieving 3-12% relative improvements over strong baselines across diverse model series and approaches. Moreover, our framework enhances the robustness, interpretability, and transparency of tool-using agents, advancing the development of more reliable AI assistants for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Addresses incorrect parameterization and poor tool selection in LLMs
Solves misinterpretation of user intent in function-calling tasks
Improves incomplete understanding of user goals and tool documentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guided-structured templates for function calling
Curriculum-inspired framework with step-by-step instructions
Improved robustness and interpretability of tool-using agents
Hy Dang
Hy Dang
University of Notre Dame
Natural Language ProcessingMachine LearningData MiningHealth-Related Problem.
T
Tianyi Liu
Amazon
Z
Zhuofeng Wu
Amazon
J
Jingfeng Yang
Amazon
Haoming Jiang
Haoming Jiang
OpenAI; Ex-Amazon; Georgia Institute of Technology
Machine Learning
T
Tao Yang
Amazon
P
Pei Chen
Amazon
Z
Zhengyang Wang
Amazon
H
Helen Wang
Amazon
H
Huasheng Li
Amazon
Bing Yin
Bing Yin
Amazon.com
NLPInformation RetrievalDeep LearningKnowledge Graphs
M
Meng Jiang
University of Notre Dame, Amazon