🤖 AI Summary
To address the limitations of vision-language models (VLMs) under test-time distribution shifts—namely, the need for additional training, high computational overhead, and constrained expressivity within the original feature space—this paper proposes a fully training-free test-time adaptation method. Our approach comprises three core components: (1) inter-class discrepancy-driven feature space rotation to enhance discriminability; (2) interpretable orthogonal basis transformation to overcome representational bottlenecks in the native feature space; and (3) a dynamic sample queue mechanism enabling online modeling of cross-class representative samples. To our knowledge, this is the first method achieving zero-parameter-update test-time feature space reconstruction. Extensive experiments across multiple benchmarks demonstrate significant improvements over state-of-the-art methods, achieving both higher accuracy and substantially lower computational cost—thereby validating its dual advantages in efficiency and generalization.
📝 Abstract
With the development of visual-language models (VLM) in downstream task applications, test-time adaptation methods based on VLM have attracted increasing attention for their ability to address changes distribution in test-time. Although prior approaches have achieved some progress, they typically either demand substantial computational resources or are constrained by the limitations of the original feature space, rendering them less effective for test-time adaptation tasks. To address these challenges, we propose a training-free feature space rotation with basis transformation for test-time adaptation. By leveraging the inherent distinctions among classes, we reconstruct the original feature space and map it to a new representation, thereby enhancing the clarity of class differences and providing more effective guidance for the model during testing. Additionally, to better capture relevant information from various classes, we maintain a dynamic queue to store representative samples. Experimental results across multiple benchmarks demonstrate that our method outperforms state-of-the-art techniques in terms of both performance and efficiency.