🤖 AI Summary
Detecting out-of-distribution (OOD) samples at test time remains challenging: ID-only methods suffer from limited discriminative capacity, while leveraging external anomaly data introduces privacy risks and task misalignment. Method: We propose AUTO, the first framework for *test-time adaptive OOD detection*, which requires no predefined anomaly data. Instead, it dynamically leverages unlabeled, real-world OOD samples from the incoming test stream to continuously refine the detector online. Contributions/Results: AUTO introduces three key components: (i) an in-out-aware filter for safe in-distribution sample selection; (ii) a dynamic memory module enabling robust replay of historical OOD patterns; and (iii) a prediction alignment objective preserving model stability. Guided by pseudo-labels, online gradient calibration, and test-time model adaptation, AUTO significantly outperforms state-of-the-art methods across standard, multi-OOD, and temporal OOD benchmarks—achieving superior detection accuracy and generalization robustness.
📝 Abstract
Out-of-distribution (OOD) detection aims to detect test samples that do not fall into any training in-distribution (ID) classes. Prior efforts focus on regularizing models with ID data only, largely underperforming counterparts that utilize auxiliary outliers. However, data safety and privacy make it infeasible to collect task-specific outliers in advance for different scenarios. Besides, using task-irrelevant outliers leads to inferior OOD detection performance. To address the above issue, we present a new setup called test-time OOD detection, which allows the deployed model to utilize real OOD data from the unlabeled data stream during testing. We propose Adaptive Outlier Optimization (AUTO) which allows for continuous adaptation of the OOD detector. Specifically, AUTO consists of three key components: 1) an in-out-aware filter to selectively annotate test samples with pseudo-ID and pseudo-OOD and ingeniously trigger the updating process while encountering each pseudo-OOD sample; 2) a dynamic-updated memory to overcome the catastrophic forgetting led by frequent parameter updates; 3) a prediction-aligning objective to calibrate the rough OOD objective during testing. Extensive experiments show that AUTO significantly improves OOD detection performance over state-of-the-art methods. Besides, evaluations on complicated scenarios (e.g. multi-OOD, time-series OOD) also conduct the superiority of AUTO.