🤖 AI Summary
This work addresses the performance degradation of models under test-time distribution shifts by proposing AcTTA, a novel test-time adaptation (TTA) framework that extends adaptation mechanisms beyond normalization layers to activation functions. Recognizing that existing TTA methods predominantly rely on affine modulation of normalization statistics while overlooking the dynamic influence of activations, AcTTA introduces learnable parameterized activation functions—such as reparameterized variants of ReLU and GELU—to dynamically adjust response thresholds and gradient sensitivity during inference. This approach enables lightweight adaptation without updating model weights or accessing source-domain data. Extensive experiments demonstrate that AcTTA consistently outperforms state-of-the-art normalization-based TTA methods on CIFAR-10/100-C and ImageNet-C benchmarks, achieving superior robustness and stability under diverse corruptions.
📝 Abstract
Test-time adaptation (TTA) aims to mitigate performance degradation under distribution shifts by updating model parameters during inference. Existing approaches have primarily framed adaptation around affine modulation, focusing on recalibrating normalization layers. This perspective, while effective, overlooks another influential component in representation dynamics: the activation function. We revisit this overlooked space and propose AcTTA, an activation-aware framework that reinterprets conventional activation functions from a learnable perspective and updates them adaptively at test time. AcTTA reformulates conventional activation functions (e.g., ReLU, GELU) into parameterized forms that shift their response threshold and modulate gradient sensitivity, enabling the network to adjust activation behavior under domain shifts. This functional reparameterization enables continuous adjustment of activation behavior without modifying network weights or requiring source data. Despite its simplicity, AcTTA achieves robust and stable adaptation across diverse corruptions. Across CIFAR10-C, CIFAR100-C, and ImageNet-C, AcTTA consistently surpasses normalization-based TTA methods. Our findings highlight activation adaptation as a compact and effective route toward domain-shift-robust test-time learning, broadening the prevailing affine-centric view of adaptation.