🤖 AI Summary
To address performance degradation in quantized models deployed on edge devices due to distribution shifts, this paper proposes the first backpropagation-free, stateless test-time adaptation (TTA) framework. The method integrates dynamic normalization statistic updates, quantization-aware module fusion, and lightweight partial-parameter adaptation, enabling low-overhead real-time adaptation within a stateless online inference architecture. On ResNet-18, it incurs only 11.2 MB peak memory and introduces negligible adaptation latency—comparable to standard inference. Across heterogeneous sensor modalities, it reduces average error rates by 15.7% over existing TTA approaches. The core contribution lies in achieving zero-gradient, zero-historical-state TTA for quantized models—uniquely balancing accuracy, computational efficiency, and stringent edge-deployment constraints.
📝 Abstract
While there are many advantages to deploying machine learning models on edge devices, the resource constraints of mobile platforms, the dynamic nature of the environment, and differences between the distribution of training versus in-the-wild data make such deployments challenging. Current test-time adaptation methods are often memory-intensive and not designed to be quantization-compatible or deployed on low-resource devices. To address these challenges, we present LeanTTA, a novel backpropagation-free and stateless framework for quantized test-time adaptation tailored to edge devices. Our approach minimizes computational costs by dynamically updating normalization statistics without backpropagation, which frees LeanTTA from the common pitfall of relying on large batches and historical data, making our method robust to realistic deployment scenarios. Our approach is the first to enable further computational gains by combining partial adaptation with quantized module fusion. We validate our framework across sensor modalities, demonstrating significant improvements over state-of-the-art TTA methods, including a 15.7% error reduction, peak memory usage of only 11.2MB for ResNet18, and fast adaptation within an order-of-magnitude of normal inference speeds on-device. LeanTTA provides a robust solution for achieving the right trade offs between accuracy and system efficiency in edge deployments, addressing the unique challenges posed by limited data and varied operational conditions.