🤖 AI Summary
Symbolic execution faces fundamental limitations in safety-critical C/C++ embedded cyber-physical systems (CPS), including inability to handle black-box components and dependence on source-level constraint solving. Method: This paper proposes the first systematic fuzzing-driven mutation testing framework tailored for CPS. It integrates Clang-based instrumentation, AddressSanitizer (ASan) for memory error detection, laf-intel for enhanced coverage guidance, and the AFL fuzzer to automatically generate high-coverage inputs that efficiently expose surviving mutants. Results: Evaluated on real satellite system components, our approach achieves up to a 50-percentage-point improvement in surviving mutant detection over symbolic execution—without requiring source-level constraint solving. Compared to developer-written test suites, it detects 40–90% more previously undetected surviving mutants. Moreover, a hybrid fuzzing+symbolic strategy yields negligible gain (<1 percentage point), demonstrating the effectiveness and practicality of pure fuzzing-based mutation testing for CPS.
📝 Abstract
Mutation testing can help minimize the delivery of faulty software. Therefore, it is a recommended practice for developing embedded software in safety-critical cyber-physical systems (CPS). However, state-of-the-art mutation testing techniques for C and C++ software, which are common languages for CPS, depend on symbolic execution. Unfortunately, symbolic execution's limitations hinder its applicability (e.g., systems with black-box components). We propose relying on fuzz testing, which has demonstrated its effectiveness for C and C++ software. Fuzz testing tools automatically create test inputs that explore program branches in various ways, exercising statements in different program states, and thus enabling the detection of mutants, which is our objective. We empirically evaluated our approach using software components from operational satellite systems. Our assessment shows that our approach can detect between 40% and 90% of the mutants not detected by developers' test suites. Further, we empirically determined that the best results are obtained by integrating the Clang compiler, a memory address sanitizer, and relying on laf-intel instrumentation to collect coverage and guide fuzzing. Our approach detects a significantly higher percentage of live mutants compared to symbolic execution, with an increase of up to 50 percentage points; further, we observed that although the combination of fuzzing and symbolic execution leads to additional mutants being killed, the benefits are minimal (a gain of less than one percentage point).