BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis

📅 2025-07-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (e.g., CLIP) suffer severe degradation in zero-shot recognition under temporal distribution shifts (e.g., gradual lighting or seasonal changes). Existing continual test-time adaptation (CTTA) methods neglect temporal continuity, leading to three critical limitations: memory constraints, miscalibrated entropy-based confidence, and static feature representations. This work formally introduces the *continual-temporal test-time adaptation (CT-TTA)* problem and proposes a Bayesian inference–driven adaptive framework. It incorporates data-free class-conditional Gaussian mixture estimation, hypothesis-testing-guided covariance structure selection, and calibrated inference to ensure prediction consistency and dynamic representation alignment. The method integrates Gaussian discriminant analysis, statistical hypothesis testing, and self-paced normalization layer optimization. Evaluated on four temporal benchmarks and ten standard TTA datasets, it significantly outperforms state-of-the-art approaches while maintaining high efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under extit{temporally evolving distribution shifts} common in real-world scenarios (e.g., gradual illumination or seasonal changes). Existing continual test-time adaptation (CTTA) methods are typically built around sudden and severe distribution shifts and neglect temporal continuity, leading to three core defects: limited memory cache restricts long-range distribution modeling, causing catastrophic forgetting; entropy-based confidence becomes unreliable under temporal drift, worsening error accumulation; and static visual representations misalign with evolving inputs. We formalize this practical problem as extit{Continual-Temporal Test-Time Adaptation (CT-TTA)}, where test distributions evolve gradually over time. To address it, we propose extit{BayesTTA}, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations. Specifically, BayesTTA incrementally estimates class-conditional Gaussian mixture distributions without storing raw data, adaptively selects covariance structures through statistical hypothesis testing, and performs calibrated inference using Gaussian discriminant analysis (GDA). These calibrated predictions supervise self-paced adaptation of normalization layers, ensuring efficient and stable representation alignment. We establish a comprehensive CT-TTA benchmark across four temporally evolving datasets and further evaluate generalization on ten standard TTA datasets. Extensive experiments show that BayesTTA consistently outperforms state-of-the-art methods, achieving significant gains while maintaining efficiency. Code is available at href{https://github.com/cuishuang99/BayesTTA}{https://github.com/cuishuang99/BayesTTA}.
Problem

Research questions and friction points this paper is trying to address.

Addresses vision-language model degradation under temporal distribution shifts
Overcomes limitations of existing methods in continual test-time adaptation
Ensures temporally consistent predictions and dynamic representation alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian adaptation framework for temporal consistency
Gaussian discriminant analysis for calibrated inference
Self-paced adaptation of normalization layers
🔎 Similar Papers
No similar papers found.
Shuang Cui
Shuang Cui
National Renewable Energy Laboratory/University of Texas at Dallas
Thermal Energy StorageSmart MaterialsDecarbonizationWater and Energy Sustainability
J
Jinglin Xu
National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
Y
Yi Li
National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
X
Xiongxin Tang
National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
Jiangmeng Li
Jiangmeng Li
Institute of Software, Chinese Academy of Science
Multi-modal learningSelf-supervised learningDomain generalizationCausal learning
Jiahuan Zhou
Jiahuan Zhou
Peking University
Computer VisionMachine LearningDeep Learning
F
Fanjiang Xu
National Key Laboratory of Space Integrated Information System, Institute of Software Chinese Academy of Sciences, and University of Chinese Academy of Sciences, Beijing, China
F
Fuchun Sun
Department of Computer Science and Technology, Tsinghua University, Beijing, China
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser