Call-Chain-Aware LLM-Based Test Generation for Java Projects

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

133K/year

🤖 AI Summary

This work addresses the limitations of existing large language model (LLM)-based approaches for unit test generation in Java projects characterized by complex inter-class dependencies, deep call chains, and intricate object initialization requirements. To overcome these challenges, the authors propose CAT, a novel method that systematically integrates static analysis–derived call chains and dependency context into LLM prompts. CAT constructs executable test templates incorporating method call relationships, constructor invocations, and third-party dependencies, and further refines failing test cases through an iterative repair mechanism. Experimental results on Defects4J demonstrate that CAT outperforms the state-of-the-art PANTA method, achieving relative improvements of 18.04% in line coverage and 21.74% in branch coverage, while consistently maintaining superior performance on real-world GitHub projects.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have recently shown strong potential for generating project-level unit tests. However, existing state-of-the-art approaches primarily rely on execution-path information to guide prompt construction, which is often insufficient for complex software systems with rich inter-class dependencies, deep call chains, and intricate object initialization requirements. In this paper, we present CAT, a novel call-chain-aware LLM-based test generation approach that explicitly incorporates call-chain and dependency contexts into prompts through dedicated static analysis. To construct executable, semantically valid test contexts, CAT systematically models caller--callee relationships, object constructors, and third-party dependencies, and supports iterative test fixing when generation failures occur. We evaluate CAT on the widely used Defects4J benchmark and on four real-world GitHub projects released after the LLM's cut-off date. The results show that, across projects in Defects4J, CAT improves line and branch coverage by 18.04% and 21.74%, respectively, over the state-of-the-art approach PANTA, while consistently achieving superior performance on post-cutoff real-world projects. An ablation study further demonstrates the importance of call-chain and dependency contexts in CAT.

Problem

Research questions and friction points this paper is trying to address.

test generation

call chain

dependency context

LLM

unit testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

call-chain-aware

LLM-based test generation

static analysis