Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the fine-grained out-of-distribution (OOD) detection challenge—specifically, distinguishing ambiguous OOD samples lying near the in-distribution (InD) decision boundary—this paper proposes a foundation-model-based synthetic anomaly augmentation framework. Our method is the first to synergistically integrate multimodal large language models (MLLMs) with diffusion models, enabling iterative, semantics-guided inpainting to generate high-fidelity synthetic anomalies tightly localized at the classification boundary. We further introduce an energy-score gradient-driven noise optimization mechanism to enable efficient sampling within boundary-proximal regions. Additionally, we design a joint fine-tuning strategy for the CLIP image encoder and negative-label feature embeddings. Evaluated on ImageNet, our approach achieves a 2.80% AUROC improvement and a 11.13% reduction in FPR95, with negligible parameter overhead and inference cost, significantly enhancing robustness for fine-grained OOD detection.

Technology Category

Application Category

📝 Abstract
Pre-trained vision-language models have exhibited remarkable abilities in detecting out-of-distribution (OOD) samples. However, some challenging OOD samples, which lie close to in-distribution (InD) data in image feature space, can still lead to misclassification. The emergence of foundation models like diffusion models and multimodal large language models (MLLMs) offers a potential solution to this issue. In this work, we propose SynOOD, a novel approach that harnesses foundation models to generate synthetic, challenging OOD data for fine-tuning CLIP models, thereby enhancing boundary-level discrimination between InD and OOD samples. Our method uses an iterative in-painting process guided by contextual prompts from MLLMs to produce nuanced, boundary-aligned OOD samples. These samples are refined through noise adjustments based on gradients from OOD scores like the energy score, effectively sampling from the InD/OOD boundary. With these carefully synthesized images, we fine-tune the CLIP image encoder and negative label features derived from the text encoder to strengthen connections between near-boundary OOD samples and a set of negative labels. Finally, SynOOD achieves state-of-the-art performance on the large-scale ImageNet benchmark, with minimal increases in parameters and runtime. Our approach significantly surpasses existing methods, improving AUROC by 2.80% and reducing FPR95 by 11.13%. Codes are available in https://github.com/Jarvisgivemeasuit/SynOOD.
Problem

Research questions and friction points this paper is trying to address.

Detecting challenging near-boundary OOD samples accurately
Generating synthetic OOD data using foundation models
Improving CLIP model discrimination between InD and OOD
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic OOD samples using foundation models
Fine-tunes CLIP with boundary-aligned OOD samples
Improves OOD detection via noise-adjusted gradient refinement
🔎 Similar Papers
No similar papers found.
J
Jinglun Li
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai
Kaixun Jiang
Kaixun Jiang
Fudan University
Computer VisionAdversarial Examples
Zhaoyu Chen
Zhaoyu Chen
TikTok
AI SecurityTrustworthy AIMultimodal AIGenerative AI
B
Bo Lin
JIIOV Technology, Beijing
Y
Yao Tang
JIIOV Technology, Beijing
Weifeng Ge
Weifeng Ge
Fudan University
Humanoid RobotComputer VisionArtificial IntelligenceAI4Science
W
Wenqiang Zhang
College of Intelligent Robotics and Advanced Manufacturing, Fudan University, Shanghai; Shanghai Key Lab of Intelligent Information Processing, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai