Neutralizing Token Aggregation via Information Augmentation for Efficient Test-Time Adaptation

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing test-time adaptation (TTA) methods for Vision Transformers (ViTs) achieve effectiveness at the cost of high computational overhead; while plug-and-play token aggregation reduces latency, it incurs substantial performance degradation due to information loss. This paper proposes Efficient Test-Time Adaptation (ETTA), a lightweight TTA framework that enables low-latency online adaptation without additional training data. First, we formulate token aggregation as an information-theoretic problem by quantifying mutual information loss. Guided by this analysis, we design two key components: (i) [CLS] token embedding enhancement to preserve global semantic information, and (ii) shallow-layer adaptive bias injection to refine feature representations with minimal overhead. These are jointly optimized via entropy minimization. Evaluated across multiple out-of-distribution benchmarks, ETTA improves average accuracy by over 2.5% while reducing inference latency by more than 20%, achieving a favorable trade-off between efficiency and robustness.

Technology Category

Application Category

📝 Abstract
Test-Time Adaptation (TTA) has emerged as an effective solution for adapting Vision Transformers (ViT) to distribution shifts without additional training data. However, existing TTA methods often incur substantial computational overhead, limiting their applicability in resource-constrained real-world scenarios. To reduce inference cost, plug-and-play token aggregation methods merge redundant tokens in ViTs to reduce total processed tokens. Albeit efficient, it suffers from significant performance degradation when directly integrated with existing TTA methods. We formalize this problem as Efficient Test-Time Adaptation (ETTA), seeking to preserve the adaptation capability of TTA while reducing inference latency. In this paper, we first provide a theoretical analysis from a novel mutual information perspective, showing that token aggregation inherently leads to information loss, which cannot be fully mitigated by conventional norm-tuning-based TTA methods. Guided by this insight, we propose to extbf{N}eutralize Token extbf{A}ggregation extbf{v}ia extbf{I}nformation extbf{A}ugmentation ( extbf{NAVIA}). Specifically, we directly augment the [CLS] token embedding and incorporate adaptive biases into the [CLS] token in shallow layers of ViTs. We theoretically demonstrate that these augmentations, when optimized via entropy minimization, recover the information lost due to token aggregation. Extensive experiments across various out-of-distribution benchmarks demonstrate that NAVIA significantly outperforms state-of-the-art methods by over 2.5%, while achieving an inference latency reduction of more than 20%, effectively addressing the ETTA challenge.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in Test-Time Adaptation for Vision Transformers
Mitigating performance degradation from token aggregation in TTA methods
Enhancing adaptation capability while decreasing inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neutralizes token aggregation via information augmentation
Augments [CLS] token embedding with adaptive biases
Optimizes via entropy minimization to recover lost information
🔎 Similar Papers
No similar papers found.