SAINT: Service-level Integration Test Generation with Program Analysis and LLM-based Agents

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RESTful API testing tools rely either on OpenAPI specifications—often unavailable in enterprise settings—or fuzzing, limiting their ability to generate functional test cases that cover business logic, database interactions, and end-to-end scenarios. This paper proposes SAINT, a Java service-layer integration test generation approach that synergizes static program analysis with LLM-based agents to construct endpoint models and operation dependency graphs, and iteratively generates semantically coherent, scenario-driven tests via a “plan–execute–reflect” loop. Its key innovation lies in tightly coupling code structure, data-flow semantics, and natural language reasoning to align tests with business intent. Evaluated on eight real-world Java projects, SAINT significantly improves branch coverage (average +28.6%) and defect detection rates; the generated tests received strong endorsement from developers.

Technology Category

Application Category

📝 Abstract
Enterprise applications are typically tested at multiple levels, with service-level testing playing an important role in validating application functionality. Existing service-level testing tools, especially for RESTful APIs, often employ fuzzing and/or depend on OpenAPI specifications which are not readily available in real-world enterprise codebases. Moreover, these tools are limited in their ability to generate functional tests that effectively exercise meaningful scenarios. In this work, we present SAINT, a novel white-box testing approach for service-level testing of enterprise Java applications. SAINT combines static analysis, large language models (LLMs), and LLM-based agents to automatically generate endpoint and scenario-based tests. The approach builds two key models: an endpoint model, capturing syntactic and semantic information about service endpoints, and an operation dependency graph, capturing inter-endpoint ordering constraints. SAINT then employs LLM-based agents to generate tests. Endpoint-focused tests aim to maximize code and database interaction coverage. Scenario-based tests are synthesized by extracting application use cases from code and refining them into executable tests via planning, action, and reflection phases of the agentic loop. We evaluated SAINT on eight Java applications, including a proprietary enterprise application. Our results illustrate the effectiveness of SAINT in coverage, fault detection, and scenario generation. Moreover, a developer survey provides strong endorsement of the scenario-based tests generated by SAINT. Overall, our work shows that combining static analysis with agentic LLM workflows enables more effective, functional, and developer-aligned service-level test generation.
Problem

Research questions and friction points this paper is trying to address.

Generating service-level tests without requiring OpenAPI specifications
Creating functional tests that exercise meaningful application scenarios
Overcoming limitations of existing tools in enterprise Java applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines static analysis with LLM-based agents
Builds endpoint models and operation dependency graphs
Generates tests through agentic planning and reflection
🔎 Similar Papers
No similar papers found.
Rangeet Pan
Rangeet Pan
Staff Research Scientist, IBM Research, Yorktown Heights
Software EngineeringProgramming LanguageLarge Language Models
R
Raju Pavuluri
IBM Research, Yorktown Heights, NY, USA
Ruikai Huang
Ruikai Huang
Georgia Institute of Technology, Atlanta, GA, USA
R
Rahul Krishna
IBM Research, Yorktown Heights, NY, USA
T
Tyler Stennett
Georgia Institute of Technology, Atlanta, GA, USA
A
Alessandro Orso
University of Georgia, Athens, GA, USA
S
Saurabh Sinha
IBM Research, Yorktown Heights, NY, USA