LLM-assisted Mutation for Whitebox API Testing

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing API system-level testing suffers from the “fitness landscape” problem: conventional coverage metrics lack gradient information, causing search stagnation. This paper introduces MioHint—the first framework to deeply integrate large language models (LLMs) into white-box API testing while overcoming their inherent context-length and memory limitations. Methodologically, MioHint synergistically combines static analysis (e.g., def-use chains, call graphs) with LLM-based code understanding. It performs statement-level data-dependency analysis to precisely retrieve cross-file and cross-functionally relevant code fragments, enabling prompt-driven, semantics-aware test input mutation. Evaluated on 16 real-world REST APIs, MioHint achieves an average 4.95% improvement in line coverage, a 67× increase in mutation accuracy, and raises coverage of hard-to-reach targets from <10% to >57%.

Technology Category

Application Category

📝 Abstract
Cloud applications heavily rely on APIs to communicate with each other and exchange data. To ensure the reliability of cloud applications, cloud providers widely adopt API testing techniques. Unfortunately, existing API testing approaches are insufficient to reach strict conditions, a problem known as fitness plateaus, due to the lack of gradient provided by coverage metrics. To address this issue, we propose MioHint, a novel white-box API testing approach that leverages the code comprehension capabilities of Large Language Model (LLM) to boost API testing. The key challenge of LLM-based API testing lies in system-level testing, which emphasizes the dependencies between requests and targets across functions and files, thereby making the entire codebase the object of analysis. However, feeding the entire codebase to an LLM is impractical due to its limited context length and short memory. MioHint addresses this challenge by synergizing static analysis with LLMs. We retrieve relevant code with data-dependency analysis at the statement level, including def-use analysis for variables used in the target and function expansion for subfunctions called by the target. To evaluate the effectiveness of our method, we conducted experiments across 16 real-world REST API services. The findings reveal that MioHint achieves an average increase of 4.95% absolute in line coverage compared to the baseline, EvoMaster, alongside a remarkable factor of 67x improvement in mutation accuracy. Furthermore, our method successfully covers over 57% of hard-to-cover targets while in baseline the coverage is less than 10%.
Problem

Research questions and friction points this paper is trying to address.

Overcoming fitness plateaus in API testing due to inadequate coverage metrics.
Enhancing API testing with LLM-assisted mutation for system-level dependencies.
Addressing LLM context limitations in analyzing entire codebases for API testing.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted mutation for API testing
Static analysis synergized with LLMs
Data-dependency analysis for code retrieval