Ego4OOD: Rethinking Egocentric Video Domain Generalization via Covariate Shift Scoring

πŸ“… 2026-01-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of generalizing egocentric video action recognition under domain shift, where large intra-class spatiotemporal variation, long-tailed feature distributions, and strong coupling between actions and environments hinder performance. To this end, we propose Ego4OODβ€”the first domain generalization benchmark derived from Ego4Dβ€”that explicitly targets covariate shift while mitigating concept shift through semantically consistent action partitioning. We further introduce a clustering-driven covariate shift score to quantify domain difficulty. Methodologically, we adopt a one-vs-all binary classification strategy, decomposing the multi-class task into independent binary subtasks, each solved by a lightweight two-layer fully connected network. Experiments demonstrate that our framework achieves state-of-the-art performance on both Ego4OOD and Argo1M with fewer parameters and without leveraging multimodal inputs, and for the first time empirically reveals a significant negative correlation between the degree of covariate shift and recognition accuracy.

Technology Category

Application Category

πŸ“ Abstract
Egocentric video action recognition under domain shifts remains challenging due to large intra-class spatio-temporal variability, long-tailed feature distributions, and strong correlations between actions and environments. Existing benchmarks for egocentric domain generalization often conflate covariate shifts with concept shifts, making it difficult to reliably evaluate a model's ability to generalize across input distributions. To address this limitation, we introduce Ego4OOD, a domain generalization benchmark derived from Ego4D that emphasizes measurable covariate diversity while reducing concept shift through semantically coherent, moment-level action categories. Ego4OOD spans eight geographically distinct domains and is accompanied by a clustering-based covariate shift metric that provides a quantitative proxy for domain difficulty. We further leverage a one-vs-all binary training objective that decomposes multi-class action recognition into independent binary classification tasks. This formulation is particularly well-suited for covariate shift by reducing interference between visually similar classes under feature distribution shift. Using this formulation, we show that a lightweight two-layer fully connected network achieves performance competitive with state-of-the-art egocentric domain generalization methods on both Argo1M and Ego4OOD, despite using fewer parameters and no additional modalities. Our empirical analysis demonstrates a clear relationship between measured covariate shift and recognition performance, highlighting the importance of controlled benchmarks and quantitative domain characterization for studying out-of-distribution generalization in egocentric video.
Problem

Research questions and friction points this paper is trying to address.

egocentric video
domain generalization
covariate shift
action recognition
out-of-distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

covariate shift
domain generalization
egocentric video
binary decomposition
OOD benchmark
πŸ”Ž Similar Papers
No similar papers found.