Return of Frustratingly Easy Unsupervised Video Domain Adaptation

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the challenge of spatial and temporal distribution shifts in unsupervised video domain adaptation by proposing MetaTrans, a novel approach that explicitly decouples spatial and temporal domain offsets. MetaTrans introduces a temporal-static subtraction module to disentangle dynamic and static features, which are then jointly optimized through a dual-loss objective. The method employs a concise yet purpose-built network architecture that effectively mitigates both types of domain shift. Extensive experiments demonstrate that MetaTrans substantially outperforms current state-of-the-art methods across multiple cross-domain action recognition benchmarks, achieving significant improvements in both absolute and relative adaptation performance.
📝 Abstract
Unsupervised video domain adaptation (UVDA) is a practical but under-explored problem. In this paper, we propose a frustratingly easy UVDA method, called MetaTrans. Specifically, MetaTrans adopts a concise learning objective that contains only two fundamental loss terms. Despite the simplicity of the learning objective, MetaTrans embodies an advanced UVDA idea, that is, handling the spatial and temporal divergence of cross-domain videos separately, through a subtle model architecture design. By implementing a temporal-static subtraction module, MetaTrans effectively removes spatial and temporal divergence. Extensive empirical evaluations, particularly on various cross-domain action recognition tasks, show substantial absolute adaptation performance enhancement and significantly superior relative performance gain compared with state-of-the-art UVDA baselines.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised Video Domain Adaptation
Spatial Divergence
Temporal Divergence
Cross-domain Action Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

unsupervised video domain adaptation
spatial-temporal divergence
MetaTrans
temporal-static subtraction
domain adaptation
🔎 Similar Papers