🤖 AI Summary
Current large language models lack targeted guidance toward high-risk execution paths when generating test cases, limiting their ability to effectively uncover deep-seated software defects. This work proposes GLMTest, a novel framework that explicitly integrates code property graphs into the test generation process of language models for the first time. By jointly modeling program structure and semantics through graph neural networks and the Qwen2.5-Coder-7B-Instruct model, GLMTest enables controllable test case generation tailored to specific execution branches. Evaluated on the TestGenEval benchmark, the approach significantly improves branch-hit accuracy from 27.4% to 50.2%, outperforming state-of-the-art models such as Claude-Sonnet-4.5 and GPT-4o-mini.
📝 Abstract
Recent advances in large language models for test case generation have improved branch coverage via prompt-engineered mutations. However, they still lack principled mechanisms for steering models toward specific high-risk execution branches, limiting their effectiveness for discovering subtle bugs and security vulnerabilities. We propose GLMTest, the first program structure-aware LLM framework for targeted test case generation that seamlessly integrates code property graphs and code semantics using a graph neural network and a language model to condition test case generation on execution branches. This structured conditioning enables controllable and branch-targeted test case generation, thereby potentially enhancing bug and security risk discovery. Experiments on real-world projects show that GLMTest built on a Qwen2.5-Coder-7B-Instruct model improves branch accuracy from 27.4% to 50.2% on TestGenEval benchmark compared with state-of-the-art LLMs, i.e., Claude-Sonnet-4.5 and GPT-4o-mini.