🤖 AI Summary
This work addresses error accumulation and catastrophic forgetting in test-time continual adaptation caused by texture bias. Inspired by the human visual system’s ability to disentangle shape and texture, the authors propose a plug-in sparse mixture-of-experts architecture. The method employs domain-aware routing to activate sparse experts that decouple domain-invariant structural features from domain-specific textures. Stability and controllability during continual adaptation are further enhanced through several key components: exponential moving average (EMA)-anchored reverse KL online policy distillation, spatially differentiable dropout, low- and high-rank bottleneck layers, and dynamic data augmentation. The approach achieves state-of-the-art performance across multiple benchmarks, including CIFAR-10/100-C and ImageNet-C for robust classification, as well as the Cityscapes→ACDC domain shift in semantic segmentation.
📝 Abstract
Continual test-time adaptation adapts a source-pretrained model to non-stationary, unlabeled target streams while retaining past competence, yet texture-biased backbones risk error accumulation and catastrophic forgetting. Drawing inspiration from the process of decoupling shape and texture in the human visual system, we introduce MoASE, a plug-in mixture-of-experts that disentangles domain-agnostic structure from domain-specific texture using Activation Sparsity Experts with Spatial Differentiable Dropout, forming complementary high- and low-activation pathways, while high- and low-rank bottlenecks diversify representations. The Activation Sparsity Gate produces input-adaptive SDD thresholds for precise token selection, and the Domain-Aware Router assigns per-sample expert weights using texture-sensitive cues. To curb confirmation bias on unlabeled streams and stabilize supervision, we then introduce Domain-Adaptive On-Policy Distillation to constitute MoASE++, with an EMA-anchored on-policy reverse KL distillation and an augmentation policy conditioned on entropy and confidence that aligns predictions across the same views and improves the robustness-plasticity balance. Extensive experiments on classification (CIFAR-10/100-C, ImageNet-C) and semantic segmentation (Cityscapes->ACDC) demonstrate consistent state-of-the-art performance, offering a principled, controllable approach to continual adaptation in dynamic visual environments.