Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the fine-grained out-of-distribution (OOD) detection challenge—specifically, distinguishing ambiguous OOD samples lying near the in-distribution (InD) decision boundary—this paper proposes a foundation-model-based synthetic anomaly augmentation framework. Our method is the first to synergistically integrate multimodal large language models (MLLMs) with diffusion models, enabling iterative, semantics-guided inpainting to generate high-fidelity synthetic anomalies tightly localized at the classification boundary. We further introduce an energy-score gradient-driven noise optimization mechanism to enable efficient sampling within boundary-proximal regions. Additionally, we design a joint fine-tuning strategy for the CLIP image encoder and negative-label feature embeddings. Evaluated on ImageNet, our approach achieves a 2.80% AUROC improvement and a 11.13% reduction in FPR95, with negligible parameter overhead and inference cost, significantly enhancing robustness for fine-grained OOD detection.

Technology Category

Application Category

📝 Abstract

Pre-trained vision-language models have exhibited remarkable abilities in detecting out-of-distribution (OOD) samples. However, some challenging OOD samples, which lie close to in-distribution (InD) data in image feature space, can still lead to misclassification. The emergence of foundation models like diffusion models and multimodal large language models (MLLMs) offers a potential solution to this issue. In this work, we propose SynOOD, a novel approach that harnesses foundation models to generate synthetic, challenging OOD data for fine-tuning CLIP models, thereby enhancing boundary-level discrimination between InD and OOD samples. Our method uses an iterative in-painting process guided by contextual prompts from MLLMs to produce nuanced, boundary-aligned OOD samples. These samples are refined through noise adjustments based on gradients from OOD scores like the energy score, effectively sampling from the InD/OOD boundary. With these carefully synthesized images, we fine-tune the CLIP image encoder and negative label features derived from the text encoder to strengthen connections between near-boundary OOD samples and a set of negative labels. Finally, SynOOD achieves state-of-the-art performance on the large-scale ImageNet benchmark, with minimal increases in parameters and runtime. Our approach significantly surpasses existing methods, improving AUROC by 2.80% and reducing FPR95 by 11.13%. Codes are available in https://github.com/Jarvisgivemeasuit/SynOOD.

Problem

Research questions and friction points this paper is trying to address.

Detecting challenging near-boundary OOD samples accurately

Generating synthetic OOD data using foundation models

Improving CLIP model discrimination between InD and OOD

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates synthetic OOD samples using foundation models

Fine-tunes CLIP with boundary-aligned OOD samples

Improves OOD detection via noise-adjusted gradient refinement

🔎 Similar Papers

Diffusion based Semantic Outlier Generation via Nuisance Awareness for Out-of-Distribution Detection