A Framework for Testing and Adapting REST APIs as LLM Tools

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing LLM agents frequently fail when invoking enterprise REST APIs to execute complex workflows, primarily due to ambiguous API documentation, intricate input schemas, and nonstandardized response formats; current tool-use benchmarks inadequately reflect such real-world challenges. To address this, we propose the first REST API tool-readiness evaluation framework specifically designed for LLM agents. Our approach introduces a novel three-category error taxonomy—input misinterpretation, inconsistent output handling, and schema mismatch—and integrates API schema analysis, automated test case generation, natural language instruction synthesis, and tool definition enhancement. Evaluated on 750 systematically constructed test cases, our framework identifies prevalent failure modes, enabling rapid API diagnostics and targeted toolification. Experimental results demonstrate substantial improvements in invocation success rate and robustness across diverse enterprise APIs.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are enabling autonomous agents to perform complex workflows using external tools or functions, often provided via REST APIs in enterprise systems. However, directly utilizing these APIs as tools poses challenges due to their complex input schemas, elaborate responses, and often ambiguous documentation. Current benchmarks for tool testing do not adequately address these complexities, leading to a critical gap in evaluating API readiness for agent-driven automation. In this work, we present a novel testing framework aimed at evaluating and enhancing the readiness of REST APIs to function as tools for LLM-based agents. Our framework transforms apis as tools, generates comprehensive test cases for the APIs, translates tests cases into natural language instructions suitable for agents, enriches tool definitions and evaluates the agent's ability t correctly invoke the API and process its inputs and responses. To provide actionable insights, we analyze the outcomes of 750 test cases, presenting a detailed taxonomy of errors, including input misinterpretation, output handling inconsistencies, and schema mismatches. Additionally, we classify these test cases to streamline debugging and refinement of tool integrations. This work offers a foundational step toward enabling enterprise APIs as tools, improving their usability in agent-based applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating REST API readiness for LLM-based agents

Addressing complex input schemas and ambiguous documentation

Providing actionable insights through comprehensive test cases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework transforms APIs into LLM tools

Generates natural language test cases

Classifies errors for debugging integrations

🔎 Similar Papers

You Can REST Now: Automated Specification Inference and Black-Box Testing of RESTful APIs with Large Language Models