M3OOD: Automatic Selection of Multimodal OOD Detectors

📅 2025-08-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal out-of-distribution (OOD) detection faces challenges including diverse distribution shifts, difficulty in predicting model performance under unsupervised conditions, and the absence of a universally optimal detector. Method: This paper proposes the first meta-learning-based framework for automatic selection of multimodal OOD detectors. It integrates multimodal embeddings with hand-crafted distributional statistics and cross-modal meta-features to construct a lightweight historical performance predictor, enabling rapid adaptation to unseen distribution shifts and recommending the optimal detector. Contribution/Results: Evaluated across 12 heterogeneous test scenarios, our method significantly outperforms ten baseline approaches, achieving substantial gains in recommendation accuracy while maintaining minimal computational overhead. It establishes a novel, generalizable, and efficient paradigm for detector selection in multimodal OOD detection.

Technology Category

Application Category

📝 Abstract
Out-of-distribution (OOD) robustness is a critical challenge for modern machine learning systems, particularly as they increasingly operate in multimodal settings involving inputs like video, audio, and sensor data. Currently, many OOD detection methods have been proposed, each with different designs targeting various distribution shifts. A single OOD detector may not prevail across all the scenarios; therefore, how can we automatically select an ideal OOD detection model for different distribution shifts? Due to the inherent unsupervised nature of the OOD detection task, it is difficult to predict model performance and find a universally Best model. Also, systematically comparing models on the new unseen data is costly or even impractical. To address this challenge, we introduce M3OOD, a meta-learning-based framework for OOD detector selection in multimodal settings. Meta learning offers a solution by learning from historical model behaviors, enabling rapid adaptation to new data distribution shifts with minimal supervision. Our approach combines multimodal embeddings with handcrafted meta-features that capture distributional and cross-modal characteristics to represent datasets. By leveraging historical performance across diverse multimodal benchmarks, M3OOD can recommend suitable detectors for a new data distribution shift. Experimental evaluation demonstrates that M3OOD consistently outperforms 10 competitive baselines across 12 test scenarios with minimal computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Automatically select OOD detectors for multimodal data
Predict model performance without supervision in OOD detection
Compare models efficiently on unseen multimodal distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-learning framework for OOD detector selection
Multimodal embeddings with handcrafted meta-features
Leverages historical performance for rapid adaptation
🔎 Similar Papers
No similar papers found.
Y
Yuehan Qin
University of Southern California
L
Li Li
University of Southern California
Defu Cao
Defu Cao
Peking University; MBZUAI; University of Southern California; Caltech
Time SeriesFoundation ModelMachine LearningCausal InferenceLLM
Tiankai Yang
Tiankai Yang
University of Southern California
Y
Yue Zhao
University of Southern California