Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the significant performance degradation of vision-language models under domain shift, a challenge exacerbated by existing test-time adaptation methods that are often computationally expensive, reliant on backpropagation, and confined to unimodal settings. To overcome these limitations, the authors propose TaTa, a novel approach that introduces Brownian distance covariance into multimodal test-time adaptation for the first time, enabling efficient cross-domain alignment without any training or gradient updates. TaTa further enhances semantic representation and cross-domain reasoning through attribute-augmented prompting, dynamic clustering, and refined pseudo-labeling. Extensive experiments demonstrate that TaTa substantially reduces computational overhead while achieving state-of-the-art performance on both in-domain and cross-dataset generalization benchmarks.

Technology Category

Application Category

📝 Abstract

Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.

Problem

Research questions and friction points this paper is trying to address.

domain shift

vision-language models

test-time adaptation

computational efficiency

multimodal adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-Free Adaptation

Brownian Distance Covariance

Vision-Language Models