ConCovUp: Effective Agent-Based Test Driver Generation for Concurrency Testing

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Existing approaches struggle to automatically generate high-quality concurrent test cases, resulting in insufficient coverage of shared-memory interactions in multithreaded programs. This work proposes the first multi-agent framework that integrates large language models (LLMs) with program analysis: it leverages static analysis to extract shared-memory accesses along with their calling contexts, employs an LLM for backward path tracing to synthesize inputs satisfying complex constraints, and iteratively refines test cases using dynamic execution feedback. By deeply integrating multi-agent LLM reasoning with both static and dynamic analysis, the method enables automatic generation of test drivers tailored to concurrent semantics. Experiments on nine real-world C/C++ libraries demonstrate that the approach significantly improves SMAP coverage from a baseline of 36.6% to 68.1%.

📝 Abstract

Concurrency testing is essential to improve the reliability and security of multi-threaded programs. Dynamic analysis tools, such as TSan, depend on high-quality test drivers that reach critical shared-memory interactions at runtime. However, current testing practices predominantly focus on sequential logic, leaving a gap in automated concurrent test generation. Recently, large language models (LLMs) have shown promise in generating sequential tests, but they struggle to produce effective concurrent tests without a deep understanding of concurrency semantics. This paper presents ConCovUp, a multi-agent framework that combines LLMs with program analysis. ConCovUp grounds test generation in static analysis to extract shared memory accesses and their calling contexts. To trigger hard-to-reach accesses, it introduces an LLM-driven backward tracing approach, leveraging the model's semantic reasoning to deduce concrete inputs that satisfy complex path constraints, and iteratively refines the generated tests via dynamic execution feedback. Our evaluation on nine real-world C/C++ libraries shows that ConCovUp improves average Shared Memory Access Pair Coverage (SMAP Coverage) from 36.6% to 68.1% over the general Claude Code agent baseline.

Problem

Research questions and friction points this paper is trying to address.

concurrency testing

test driver generation

shared-memory interactions

multi-threaded programs

automated testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

concurrency testing

large language models

multi-agent framework