🤖 AI Summary
To address the insufficient robustness of deep learning models stemming from a scarcity of high-quality edge-case samples in training data, this paper proposes an automated, text-guided edge-case synthesis framework. Methodologically, it fine-tunes a large language model (LLM) via preference learning to enable controllable rewriting of initial prompts, thereby generating diverse and highly challenging textual descriptions; these are then fed into a text-to-image diffusion model to synthesize corresponding visual edge-case samples—achieving end-to-end, intent-driven generation from semantic specification to difficult images. The framework eliminates reliance on manual curation, enabling continuous, scalable, and interpretable edge-case synthesis. Evaluated on the FishEye8K object detection benchmark, the synthesized data significantly improves model performance, outperforming conventional data augmentation and hand-crafted prompt engineering. The implementation is publicly available.
📝 Abstract
The performance of deep neural networks is strongly influenced by the quality of their training data. However, mitigating dataset bias by manually curating challenging edge cases remains a major bottleneck. To address this, we propose an automated pipeline for text-guided edge-case synthesis. Our approach employs a Large Language Model, fine-tuned via preference learning, to rephrase image captions into diverse textual prompts that steer a Text-to-Image model toward generating difficult visual scenarios. Evaluated on the FishEye8K object detection benchmark, our method achieves superior robustness, surpassing both naive augmentation and manually engineered prompts. This work establishes a scalable framework that shifts data curation from manual effort to automated, targeted synthesis, offering a promising direction for developing more reliable and continuously improving AI systems. Code is available at https://github.com/gokyeongryeol/ATES.