🤖 AI Summary
This work addresses the problem of blind anchoring in continual testing caused by performance degradation of the source model. It proposes RMemSafe, the first method to enable dynamic anchoring control based on source model reliability. By employing normalized predictive entropy as a reliability criterion, RMemSafe adaptively modulates the coupling strength of the source model in the optimization objective and seamlessly switches to a source-free fallback strategy when the source model fails. Integrating the ROID framework with an ASR reset mechanism, the approach combines marginal calibration and base loss to construct a robust fallback objective. Evaluated across nine continual corruption scenarios, RMemSafe achieves the lowest error rates in eight, improving accuracy by 1.05% and 0.48% over ROID+ASR on ResNet-50 and ViT-B/16, respectively, while reducing the damage slope by a factor of 1.13, thereby enabling graceful degradation.
📝 Abstract
Continual test-time adaptation (CTTA) updates a pretrained model online on an unlabeled, non-stationary stream while anchoring it to a frozen source checkpoint. This anchor is useful only when the source remains reliable. On CCC-Hard, however, a ResNet-50 source falls to approximately $1.3\%$ top-$1$ accuracy, while existing source-anchored CTTA methods continue applying the same anchor strength. We call this failure mode blind anchoring and propose RMemSafe, a reliability-gated extension of ROID that uses the frozen source's normalized predictive entropy to attenuate all explicit source-coupled uses in the objective. When the source posterior approaches uniformity, the gate closes: the source anchor and agreement filter vanish, and the objective reduces to a source-agnostic fallback comprising ROID's base losses plus marginal calibration. Combined with ASR, RMemSafe achieves the lowest error on $8$ of $9$ matched-split continual-corruption cells and is the best reset-based method on all $9$, improving ROID+ASR by $1.05$~pp on ResNet-50 and $0.48$~pp on ViT-B/16. A controlled source-degradation sweep shows a $1.13{\times}$ shallower harm slope than ROID+ASR, consistent with the graceful-decay prediction. The entropy gate detects high-entropy source collapse, not confidently wrong low-entropy sources; this scope is explicitly evaluated and discussed.