IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of models under test-time distribution shifts by proposing an efficient adaptation method based on singular value decomposition (SVD). The approach decouples linear layers in Vision Transformers into fixed singular vectors and learnable singular values, thereby constructing an intrinsic spectral mixture-of-experts architecture. To mitigate feature collapse, a diversity-maximization loss is introduced, and a domain-aware spectral code retrieval mechanism is designed to leverage historical knowledge. Requiring fine-tuning of only 0.26% of the model parameters, the method achieves state-of-the-art performance across multiple distribution shift benchmarks, improving accuracy by 3.4 and 2.4 percentage points in continuous and progressive test-time adaptation scenarios, respectively.

Technology Category

Application Category

📝 Abstract
Test-time adaptation (TTA) has been widely explored to prevent performance degradation when test data differ from the training distribution. However, fully leveraging the rich representations of large pretrained models with minimal parameter updates remains underexplored. In this paper, we propose Intrinsic Mixture of Spectral Experts (IMSE) that leverages the spectral experts inherently embedded in Vision Transformers. We decompose each linear layer via singular value decomposition (SVD) and adapt only the singular values, while keeping the singular vectors fixed. We further identify a key limitation of entropy minimization in TTA: it often induces feature collapse, causing the model to rely on domain-specific features rather than class-discriminative features. To address this, we propose a diversity maximization loss based on expert-input alignment, which encourages diverse utilization of spectral experts during adaptation. In the continual test-time adaptation (CTTA) scenario, beyond preserving pretrained knowledge, it is crucial to retain and reuse knowledge from previously observed domains. We introduce Domain-Aware Spectral Code Retrieval, which estimates input distributions to detect domain shifts, and retrieves adapted singular values for rapid adaptation. Consequently, our method achieves state-of-the-art performance on various distribution-shift benchmarks under the TTA setting. In CTTA and Gradual CTTA, it further improves accuracy by 3.4 percentage points (pp) and 2.4 pp, respectively, while requiring 385 times fewer trainable parameters. Our code is available at https://github.com/baek85/IMSE.
Problem

Research questions and friction points this paper is trying to address.

Test-Time Adaptation
Feature Collapse
Continual Adaptation
Distribution Shift
Pretrained Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Test-Time Adaptation
Spectral Experts
Singular Value Decomposition
Diversity Maximization
Domain-Aware Retrieval
🔎 Similar Papers
No similar papers found.
S
Sunghyun Baek
Korea Advanced Institute of Science and Technology (KAIST)
Jaemyung Yu
Jaemyung Yu
NAVER AI Lab
representation learning
Seunghee Koh
Seunghee Koh
KAIST
AI
M
Minsu Kim
LG Energy Solution
H
Hyeonseong Jeon
LG Energy Solution
Junmo Kim
Junmo Kim
School of Electrical Engineering, KAIST
Statistical Signal ProcessingImage ProcessingComputer VisionMachine LearningInformation Theory