Training-Free Test-Time Adaptation with Brownian Distance Covariance in Vision-Language Models

📅 2026-01-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance degradation of vision-language models under domain shift, a challenge exacerbated by existing test-time adaptation methods that are often computationally expensive, reliant on backpropagation, and confined to unimodal settings. To overcome these limitations, the authors propose TaTa, a novel approach that introduces Brownian distance covariance into multimodal test-time adaptation for the first time, enabling efficient cross-domain alignment without any training or gradient updates. TaTa further enhances semantic representation and cross-domain reasoning through attribute-augmented prompting, dynamic clustering, and refined pseudo-labeling. Extensive experiments demonstrate that TaTa substantially reduces computational overhead while achieving state-of-the-art performance on both in-domain and cross-dataset generalization benchmarks.

Technology Category

Application Category

📝 Abstract
Vision-language models suffer performance degradation under domain shift, limiting real-world applicability. Existing test-time adaptation methods are computationally intensive, rely on back-propagation, and often focus on single modalities. To address these issues, we propose Training-free Test-Time Adaptation with Brownian Distance Covariance (TaTa). TaTa leverages Brownian Distance Covariance-a powerful statistical measure that captures both linear and nonlinear dependencies via pairwise distances-to dynamically adapt VLMs to new domains without training or back-propagation. This not only improves efficiency but also enhances stability by avoiding disruptive weight updates. TaTa further integrates attribute-enhanced prompting to improve vision-language inference with descriptive visual cues. Combined with dynamic clustering and pseudo-label refinement, it effectively recalibrates the model for novel visual contexts. Experiments across diverse datasets show that TaTa significantly reduces computational cost while achieving state-of-the-art performance in domain and cross-dataset generalization.
Problem

Research questions and friction points this paper is trying to address.

domain shift
vision-language models
test-time adaptation
computational efficiency
multimodal adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-Free Adaptation
Brownian Distance Covariance
Vision-Language Models
Test-Time Adaptation
Domain Generalization
🔎 Similar Papers
No similar papers found.
Y
Yi Zhang
College of Computer Science and Software Engineering, Shenzhen University, China
Chun-Wun Cheng
Chun-Wun Cheng
PhD student, University of Cambridge
Implicit Deep LearningApplied MathematicsGenerative AI
A
Angelica I. Avilés-Rivero
Yau Mathematical Sciences Center, Tsinghua University, Beijing, China
Zhihai He
Zhihai He
Southern University of Science and Technology
Deep learningcomputer visionmachine learningsmart cyber-physical systems
L
Liang Zhang
College of Computer Science and Software Engineering, Shenzhen University, China