Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address out-of-distribution (OOD) detection under large semantic spaces and complex decision boundaries, this paper proposes a parameter-efficient, fine-tuning-free method. It leverages frozen pre-trained DINOv2 features to construct a highly discriminative representation space; introduces a Mixture of Feature Experts (MFE) module that partitions the feature space into class-specific subspaces to refine decision boundaries; and incorporates a Dynamic-β Mixup strategy that adaptively modulates interpolation weights to enhance discrimination of hard samples. Evaluated on benchmarks including CIFAR-10/100 and ImageNet-O, the method achieves significant improvements over state-of-the-art approaches—particularly in fine-grained and cross-domain OOD settings—while maintaining robustness. Key contributions include: (i) the first empirical validation of frozen DINOv2 features for OOD detection; (ii) a scalable subspace-based expert architecture; and (iii) a dynamic weighted augmentation mechanism. Collectively, these advances establish a novel paradigm for empowering vision foundation models with reliable OOD detection capabilities.

Technology Category

Application Category

📝 Abstract
Pre-trained vision foundation models have transformed many computer vision tasks. Despite their strong ability to learn discriminative and generalizable features crucial for out-of-distribution (OOD) detection, their impact on this task remains underexplored. Motivated by this gap, we systematically investigate representative vision foundation models for OOD detection. Our findings reveal that a pre-trained DINOv2 model, even without fine-tuning on in-domain (ID) data, naturally provides a highly discriminative feature space for OOD detection, achieving performance comparable to existing state-of-the-art methods without requiring complex designs. Beyond this, we explore how fine-tuning foundation models on in-domain (ID) data can enhance OOD detection. However, we observe that the performance of vision foundation models remains unsatisfactory in scenarios with a large semantic space. This is due to the increased complexity of decision boundaries as the number of categories grows, which complicates the optimization process. To mitigate this, we propose the Mixture of Feature Experts (MoFE) module, which partitions features into subspaces, effectively capturing complex data distributions and refining decision boundaries. Further, we introduce a Dynamic-$β$ Mixup strategy, which samples interpolation weights from a dynamic beta distribution. This adapts to varying levels of learning difficulty across categories, improving feature learning for more challenging categories. Extensive experiments demonstrate the effectiveness of our approach, significantly outperforming baseline methods.
Problem

Research questions and friction points this paper is trying to address.

Enhancing vision foundation models for out-of-distribution detection tasks
Addressing performance degradation in large semantic space scenarios
Improving feature learning for complex data distributions and boundaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Feature Experts partitions feature subspaces
Dynamic-β Mixup adapts interpolation weights dynamically
Vision foundation models enhanced for complex distributions