BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis

📅 2025-07-11

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Vision-language models (e.g., CLIP) suffer severe degradation in zero-shot recognition under temporal distribution shifts (e.g., gradual lighting or seasonal changes). Existing continual test-time adaptation (CTTA) methods neglect temporal continuity, leading to three critical limitations: memory constraints, miscalibrated entropy-based confidence, and static feature representations. This work formally introduces the *continual-temporal test-time adaptation (CT-TTA)* problem and proposes a Bayesian inference–driven adaptive framework. It incorporates data-free class-conditional Gaussian mixture estimation, hypothesis-testing-guided covariance structure selection, and calibrated inference to ensure prediction consistency and dynamic representation alignment. The method integrates Gaussian discriminant analysis, statistical hypothesis testing, and self-paced normalization layer optimization. Evaluated on four temporal benchmarks and ten standard TTA datasets, it significantly outperforms state-of-the-art approaches while maintaining high efficiency and robustness.

Technology Category

Application Category

📝 Abstract

Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under extit{temporally evolving distribution shifts} common in real-world scenarios (e.g., gradual illumination or seasonal changes). Existing continual test-time adaptation (CTTA) methods are typically built around sudden and severe distribution shifts and neglect temporal continuity, leading to three core defects: limited memory cache restricts long-range distribution modeling, causing catastrophic forgetting; entropy-based confidence becomes unreliable under temporal drift, worsening error accumulation; and static visual representations misalign with evolving inputs. We formalize this practical problem as extit{Continual-Temporal Test-Time Adaptation (CT-TTA)}, where test distributions evolve gradually over time. To address it, we propose extit{BayesTTA}, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations. Specifically, BayesTTA incrementally estimates class-conditional Gaussian mixture distributions without storing raw data, adaptively selects covariance structures through statistical hypothesis testing, and performs calibrated inference using Gaussian discriminant analysis (GDA). These calibrated predictions supervise self-paced adaptation of normalization layers, ensuring efficient and stable representation alignment. We establish a comprehensive CT-TTA benchmark across four temporally evolving datasets and further evaluate generalization on ten standard TTA datasets. Extensive experiments show that BayesTTA consistently outperforms state-of-the-art methods, achieving significant gains while maintaining efficiency. Code is available at href{https://github.com/cuishuang99/BayesTTA}{https://github.com/cuishuang99/BayesTTA}.

Problem

Research questions and friction points this paper is trying to address.

Addresses vision-language model degradation under temporal distribution shifts

Overcomes limitations of existing methods in continual test-time adaptation

Ensures temporally consistent predictions and dynamic representation alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian adaptation framework for temporal consistency

Gaussian discriminant analysis for calibrated inference

Self-paced adaptation of normalization layers

🔎 Similar Papers

Efficient Open Set Single Image Test Time Adaptation of Vision Language Models