🤖 AI Summary
Existing TTA benchmarks fail to model natural temporal dependencies in real-world video streams—such as object persistence across consecutive frames—limiting generalization in dynamic scenarios. To address this, we introduce ITD, the first trajectory-segment-based test-time adaptation benchmark, constructed from object tracking datasets to explicitly encode spatiotemporal continuity in evaluation settings. To tackle the resulting challenges—particularly rapid, sequential domain shifts—we propose an adversarial memory initialization mechanism: leveraging adversarial training to optimize the initial prototypes of the memory module, thereby enhancing the model’s capacity for swift self-adaptation to continuous domain drift. Extensive experiments demonstrate that our approach significantly improves both robustness and adaptation efficiency of diverse TTA methods on ITD. This work establishes a novel, realistic benchmark for video-stream-oriented TTA research and provides an effective technical pathway toward addressing temporal dynamics in test-time adaptation.
📝 Abstract
We introduce a novel tracklet-based dataset for benchmarking test-time adaptation (TTA) methods. The aim of this dataset is to mimic the intricate challenges encountered in real-world environments such as images captured by hand-held cameras, self-driving cars, etc. The current benchmarks for TTA focus on how models face distribution shifts, when deployed, and on violations to the customary independent-and-identically-distributed (i.i.d.) assumption in machine learning. Yet, these benchmarks fail to faithfully represent realistic scenarios that naturally display temporal dependencies, such as how consecutive frames from a video stream likely show the same object across time. We address this shortcoming of current datasets by proposing a novel TTA benchmark we call the "Inherent Temporal Dependencies" (ITD) dataset. We ensure the instances in ITD naturally embody temporal dependencies by collecting them from tracklets-sequences of object-centric images we compile from the bounding boxes of an object-tracking dataset. We use ITD to conduct a thorough experimental analysis of current TTA methods, and shed light on the limitations of these methods when faced with the challenges of temporal dependencies. Moreover, we build upon these insights and propose a novel adversarial memory initialization strategy to improve memory-based TTA methods. We find this strategy substantially boosts the performance of various methods on our challenging benchmark.