🤖 AI Summary
This work addresses the limitations of existing test-time out-of-distribution (OOD) detection methods, which rely on fixed external OOD labels and struggle to adapt to open and dynamically evolving semantic spaces. To overcome this, we propose TTL (Test-Time Text Learning), a novel framework that eliminates the need for external OOD labels by dynamically learning OOD textual semantics directly from unlabeled test streams. TTL leverages learnable prompts and a pseudo-labeling mechanism to continuously capture emerging OOD knowledge. Innovatively, it incorporates an OOD knowledge purification strategy and a textual knowledge base to effectively suppress pseudo-label noise and enable stable score calibration across batches. Extensive experiments on two standard benchmarks and nine OOD datasets demonstrate that TTL significantly outperforms current approaches, validating the efficacy of text-driven adaptation in enhancing the robustness of test-time OOD detection.
📝 Abstract
Vision-language models (VLMs) such as CLIP exhibit strong Out-of-distribution (OOD) detection capabilities by aligning visual and textual representations. Recent CLIP-based test-time adaptation methods further improve detection performance by incorporating external OOD labels. However, such labels are finite and fixed, while the real OOD semantic space is inherently open-ended. Consequently, fixed labels fail to represent the diverse and evolving OOD semantics encountered in test streams. To address this limitation, we introduce Test-time Textual Learning (TTL), a framework that dynamically learns OOD textual semantics from unlabeled test streams, without relying on external OOD labels. TTL updates learnable prompts using pseudo-labeled test samples to capture emerging OOD knowledge. To suppress noise introduced by pseudo-labels, we introduce an OOD knowledge purification strategy that selects reliable OOD samples for adaptation while suppressing noise. In addition, TTL maintains an OOD Textual Knowledge Bank that stores high-quality textual features, providing stable score calibration across batches. Extensive experiments on two standard benchmarks with nine OOD datasets demonstrate that TTL consistently achieves state-of-the-art performance, highlighting the value of textual adaptation for robust test-time OOD detection. Our code is available at https://github.com/figec/TTL.