Real-World Fault Detection for C-Extended Python Projects with Automated Unit Test Generation

📅 2026-03-06

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses the challenge that crashes in C extensions disrupt automated testing in Python, thereby impeding the detection of real API faults and limiting coverage of non-crashing execution paths. To overcome this, we introduce— for the first time—a subprocess isolation mechanism into the Python test generation tool Pynguin, decoupling test generation from execution. This design enables continuous test generation even when C extensions crash, facilitating the capture of reproducible failures while preserving coverage guarantees. Evaluated across 1,648 modules from 21 widely used libraries, our approach increases test coverage by 56.5% and identifies 213 distinct crash-inducing conditions, including 32 previously unknown, real-world defects.

Technology Category

Application Category

📝 Abstract

Many popular Python libraries use C-extensions for performance-critical operations allowing users to combine the best of the two worlds: The simplicity and versatility of Python and the performance of C. A drawback of this approach is that exceptions raised in C can bypass Python's exception handling and cause the entire interpreter to crash. These crashes are real faults if they occur when calling a public API. While automated test generation should, in principle, detect such faults, crashes in native code can halt the test process entirely, preventing detection or reproduction of the underlying errors and inhibiting coverage of non-crashing parts of the code. To overcome this problem, we propose separating the generation and execution stages of the test-generation process. We therefore adapt Pynguin, an automated test case generation tool for Python, to use subprocess-execution. Executing each generated test in an isolated subprocess prevents a crash from halting the test generation process itself. This allows us to (1) detect such faults, (2) generate reproducible crash-revealing test cases for them, (3) allow studying the underlying faults, and (4) enable test generation for non-crashing parts of the code. To evaluate our approach, we created a dataset consisting of 1648 modules from 21 popular Python libraries with C-extensions. Subprocess-execution allowed automated testing of up to 56.5% more modules and discovered 213 unique crash causes, revealing 32 previously unknown faults.

Problem

Research questions and friction points this paper is trying to address.

C-extensions

fault detection

crash

automated testing

Python

Innovation

Methods, ideas, or system contributions that make the work stand out.

C-extensions

automated test generation

subprocess execution