FedCVU: Federated Learning for Cross-View Video Understanding

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses three key challenges in federated cross-view video understanding: non-IID data caused by view heterogeneity, inconsistent semantic representations across views, and high communication overhead. To tackle these issues, the authors propose FedCVU, a novel framework that integrates view-specific normalization (VS-Norm), a lightweight cross-view contrastive alignment module (CV-Align), and a selective layer aggregation strategy (SLA) to enable efficient, robust, and privacy-preserving multi-view collaborative learning. Experimental results demonstrate that FedCVU significantly improves accuracy on unseen views in both cross-view action recognition and person re-identification tasks, outperforming existing federated approaches while exhibiting strong robustness to domain shifts and communication constraints.

Technology Category

Application Category

📝 Abstract

Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy. Extensive experiments on action understanding and person re-identification tasks under a cross-view protocol demonstrate that FedCVU consistently boosts unseen-view accuracy while maintaining strong seen-view performance, outperforming state-of-the-art FL baselines and showing robustness to domain heterogeneity and communication constraints.

Problem

Research questions and friction points this paper is trying to address.

federated learning

cross-view video understanding

non-IID data

representation alignment

communication overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning

Cross-View Video Understanding

Non-IID Data