🤖 AI Summary
Existing deep learning system testing methods struggle to generate high-fidelity, controllable, and realistic failure-inducing test cases in scenarios lacking labeled failure data. To address this challenge, this work proposes HyNeA, a novel approach that introduces hypernetworks into diffusion models to enable instance-level control for targeted test case generation—without requiring fine-tuning or explicit conditioning mechanisms. By decoupling the generation process from reliance on annotated failure examples, HyNeA significantly enhances both the controllability and diversity of generated test cases while preserving high realism. Moreover, it achieves these advantages at a substantially lower computational cost compared to conventional search-based strategies.
📝 Abstract
The increasing deployment of deep learning systems requires systematic evaluation of their reliability in real-world scenarios. Traditional gradient-based adversarial attacks introduce small perturbations that rarely correspond to realistic failures and mainly assess robustness rather than functional behavior. Generative test generation methods offer an alternative but are often limited to simple datasets or constrained input domains. Although diffusion models enable high-fidelity image synthesis, their computational cost and limited controllability restrict their applicability to large-scale testing. We present HyNeA, a generative testing method that enables direct and efficient control over diffusion-based generation. HyNeA provides dataset-free controllability through hypernetworks, allowing targeted manipulation of the generative process without relying on architecture-specific conditioning mechanisms or dataset-driven adaptations such as fine-tuning. HyNeA employs a distinct training strategy that supports instance-level tuning to identify failure-inducing test cases without requiring datasets that explicitly contain examples of similar failures. This approach enables the targeted generation of realistic failure cases at substantially lower computational cost than search-based methods. Experimental results show that HyNeA improves controllability and test diversity compared to existing generative test generators and generalizes to domains where failure-labeled training data is unavailable.