SPARC: Scenario Planning and Reasoning for Automated C Unit Test Generation

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses the challenge of automatically generating unit tests for C programs, where a gap between semantic intent and syntactic constraints—such as pointer manipulation and manual memory management—often yields uncompileable, low-coverage, or invalid assertion-laden tests. To bridge this gap, the paper proposes a neuro-symbolic, scenario-aware framework that integrates scenario planning and program structure–aware reasoning into large language model–driven test generation. By leveraging control-flow graph analysis, operation mapping, path-directed synthesis, and an iterative self-repair mechanism guided by compiler and runtime feedback, the approach effectively mitigates “leapfrog coding” and produces high-fidelity, maintainable test cases. Evaluated on 59 real-world and algorithmic projects, the method improves line coverage, branch coverage, and mutation score by 31.36%, 26.01%, and 20.78%, respectively, with 94.3% of repaired tests retained—matching KLEE’s performance and significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Automated unit test generation for C remains a formidable challenge due to the semantic gap between high-level program intent and the rigid syntactic constraints of pointer arithmetic and manual memory management. While Large Language Models (LLMs) exhibit strong generative capabilities, direct intent-to-code synthesis frequently suffers from the leap-to-code failure mode, where models prematurely emit code without grounding in program structure, constraints, and semantics. This will result in non-compilable tests, hallucinated function signatures, low branch coverage, and semantically irrelevant assertions that cannot properly capture bugs. We introduce SPARC, a neuro-symbolic, scenario-based framework that bridges this gap through four stages: (1) Control Flow Graph (CFG) analysis, (2) an Operation Map that grounds LLM reasoning in validated utility helpers, (3) Path-targeted test synthesis, and (4) an iterative, self-correction validation loop using compiler and runtime feedback. We evaluate SPARC on 59 real-world and algorithmic subjects, where it outperforms the vanilla prompt generation baseline by 31.36% in line coverage, 26.01% in branch coverage, and 20.78% in mutation score, matching or exceeding the symbolic execution tool KLEE on complex subjects. SPARC retains 94.3% of tests through iterative repair and produces code with significantly higher developer-rated readability and maintainability. By aligning LLM reasoning with program structure, SPARC provides a scalable path for industrial-grade testing of legacy C codebases.

Problem

Research questions and friction points this paper is trying to address.

automated unit test generation

C programming

semantic gap

LLM code generation

program semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

neuro-symbolic

scenario-based test generation

LLM grounding