🤖 AI Summary
Testing vision deep neural networks (DNNs) in safety-critical systems is hindered by the absence of ground-truth labels for real-world inputs. Method: This paper proposes the first ground-truth-free, simulation-driven testing framework. It employs a generative adversarial network (GAN) to enhance synthetic image fidelity and introduces a multi-objective fitness function integrating transformation consistency, noise robustness, surprise adequacy, and Bayesian uncertainty estimation to autonomously guide test input generation. Contribution/Results: To our knowledge, this is the first approach achieving high-fidelity simulation input generation and efficient defect triggering without ground-truth supervision. Experiments show that transformation-consistency–driven fitness significantly outperforms ground-truth–dependent methods across test coverage, defect detection, and retraining gain. DNNs retrained using guided test inputs achieve up to 12.7% accuracy improvement. The study further validates the feasibility of lightweight GANs as cost-effective alternatives to computationally expensive simulators such as diffusion models or large language models.
📝 Abstract
The generation of synthetic inputs via simulators driven by search algorithms is essential for cost-effective testing of Deep Neural Network (DNN) components for safety-critical systems. However, in many applications, simulators are unable to produce the ground-truth data needed for automated test oracles and to guide the search process. To tackle this issue, we propose an approach for the generation of inputs for computer vision DNNs that integrates a generative network to ensure simulator fidelity and employs heuristic-based search fitnesses that leverage transformation consistency, noise resistance, surprise adequacy, and uncertainty estimation. We compare the performance of our fitnesses with that of a traditional fitness function leveraging ground truth; further, we assess how the integration of a GAN not leveraging the ground truth impacts on test and retraining effectiveness. Our results suggest that leveraging transformation consistency is the best option to generate inputs for both DNN testing and retraining; it maximizes input diversity, spots the inputs leading to worse DNN performance, and leads to best DNN performance after retraining. Besides enabling simulator-based testing in the absence of ground truth, our findings pave the way for testing solutions that replace costly simulators with diffusion and large language models, which might be more affordable than simulators, but cannot generate ground-truth data.