🤖 AI Summary
This work addresses the limitations of existing test-time adaptation methods, which rely on backpropagation, incur high computational overhead, and are incompatible with non-differentiable models such as quantized networks, hindering deployment on edge devices. To overcome these challenges, the authors propose ZOTTA, a backpropagation-free test-time adaptation framework that leverages zeroth-order optimization (ZOO) using only forward passes for efficient adaptation. The key innovations include distributionally robust layer selection and spatial feature aggregation alignment, which jointly reduce the optimization dimensionality and enhance stability, enabling architecture-agnostic adaptation. Experimental results demonstrate that ZOTTA matches or surpasses gradient-based methods on ImageNet-C, ImageNet-R, ImageNet-Sketch, and ImageNet-A, while reducing memory consumption by 84% and improving accuracy by 3.9% on ImageNet-C.
📝 Abstract
Test-time adaptation (TTA) aims to improve model robustness under distribution shifts by adapting to unlabeled test data, but most existing methods rely on backpropagation (BP), which is computationally costly and incompatible with non-differentiable models such as quantized models, limiting practical deployment on numerous edge devices. Recent BP-free approaches alleviate overhead but remain either architecture-specific or limited in optimization capacity to handle high-dimensional models. We propose ZOTTA, a fully BP-free TTA framework that performs efficient adaptation using only forward passes via Zeroth-Order Optimization (ZOO). While ZOO is theoretically appealing, naive application leads to slow convergence under high-dimensional parameter spaces and unstable optimization due to the lack of labels. ZOTTA overcomes these challenges through 1) Distribution-Robust Layer Selection, which automatically identifies and freezes layers that already extract distribution-invariant features, updating only domain-sensitive layers to reduce the optimization dimensionality and accelerate convergence; 2) Spatial Feature Aggregation Alignment, which stabilizes ZOO by aligning globally aggregated spatial features between source and target to reduce gradient variance. Together, these components enable architecture-agnostic and stable BP-free adaptation. Extensive experiments on ImageNet-C/R/Sketch/A show that ZOTTA outperforms or matches BP-based methods, e.g., it reduces memory usage by 84% and improves accuracy by 3.9% over SAR on ImageNet-C.