🤖 AI Summary
Existing test-time adaptation (TTA) methods suffer from high computational overhead, heavy reliance on large volumes of target-domain data, or sensitivity to hyperparameters. This paper proposes NEO—a gradient-free, zero-overhead TTA method that incurs no additional computation or memory cost during inference. NEO leverages the geometric structure of Vision Transformer (ViT) latent spaces: it re-centers the feature embeddings of a single batch of target samples to the origin, enabling hyperparameter-free, plug-and-play adaptation. On ImageNet-C, NEO boosts ViT-Base accuracy from 55.6% to 59.2%, accelerates inference by 63%, and reduces memory usage by 9%. It consistently outperforms state-of-the-art TTA approaches across multiple benchmarks—including ImageNet-R—and achieves improved classification performance across 999 classes under cross-class transfer settings.
📝 Abstract
Test-Time Adaptation (TTA) methods are often computationally expensive, require a large amount of data for effective adaptation, or are brittle to hyperparameters. Based on a theoretical foundation of the geometry of the latent space, we are able to significantly improve the alignment between source and distribution-shifted samples by re-centering target data embeddings at the origin. This insight motivates NEO -- a hyperparameter-free fully TTA method, that adds no significant compute compared to vanilla inference. NEO is able to improve the classification accuracy of ViT-Base on ImageNet-C from 55.6% to 59.2% after adapting on just one batch of 64 samples. When adapting on 512 samples NEO beats all 7 TTA methods we compare against on ImageNet-C, ImageNet-R and ImageNet-S and beats 6/7 on CIFAR-10-C, while using the least amount of compute. NEO performs well on model calibration metrics and additionally is able to adapt from 1 class to improve accuracy on 999 other classes in ImageNet-C. On Raspberry Pi and Jetson Orin Nano devices, NEO reduces inference time by 63% and memory usage by 9% compared to baselines. Our results based on 3 ViT architectures and 4 datasets show that NEO can be used efficiently and effectively for TTA.