Return of Frustratingly Easy Unsupervised Video Domain Adaptation

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the challenge of spatial and temporal distribution shifts in unsupervised video domain adaptation by proposing MetaTrans, a novel approach that explicitly decouples spatial and temporal domain offsets. MetaTrans introduces a temporal-static subtraction module to disentangle dynamic and static features, which are then jointly optimized through a dual-loss objective. The method employs a concise yet purpose-built network architecture that effectively mitigates both types of domain shift. Extensive experiments demonstrate that MetaTrans substantially outperforms current state-of-the-art methods across multiple cross-domain action recognition benchmarks, achieving significant improvements in both absolute and relative adaptation performance.

📝 Abstract

Unsupervised video domain adaptation (UVDA) is a practical but under-explored problem. In this paper, we propose a frustratingly easy UVDA method, called MetaTrans. Specifically, MetaTrans adopts a concise learning objective that contains only two fundamental loss terms. Despite the simplicity of the learning objective, MetaTrans embodies an advanced UVDA idea, that is, handling the spatial and temporal divergence of cross-domain videos separately, through a subtle model architecture design. By implementing a temporal-static subtraction module, MetaTrans effectively removes spatial and temporal divergence. Extensive empirical evaluations, particularly on various cross-domain action recognition tasks, show substantial absolute adaptation performance enhancement and significantly superior relative performance gain compared with state-of-the-art UVDA baselines.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised Video Domain Adaptation

Spatial Divergence

Temporal Divergence

Cross-domain Action Recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

unsupervised video domain adaptation

spatial-temporal divergence

MetaTrans

temporal-static subtraction