FedCVU: Federated Learning for Cross-View Video Understanding

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key challenges in federated cross-view video understanding: non-IID data caused by view heterogeneity, inconsistent semantic representations across views, and high communication overhead. To tackle these issues, the authors propose FedCVU, a novel framework that integrates view-specific normalization (VS-Norm), a lightweight cross-view contrastive alignment module (CV-Align), and a selective layer aggregation strategy (SLA) to enable efficient, robust, and privacy-preserving multi-view collaborative learning. Experimental results demonstrate that FedCVU significantly improves accuracy on unseen views in both cross-view action recognition and person re-identification tasks, outperforming existing federated approaches while exhibiting strong robustness to domain shifts and communication constraints.

Technology Category

Application Category

📝 Abstract
Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy. Extensive experiments on action understanding and person re-identification tasks under a cross-view protocol demonstrate that FedCVU consistently boosts unseen-view accuracy while maintaining strong seen-view performance, outperforming state-of-the-art FL baselines and showing robustness to domain heterogeneity and communication constraints.
Problem

Research questions and friction points this paper is trying to address.

federated learning
cross-view video understanding
non-IID data
representation alignment
communication overhead
Innovation

Methods, ideas, or system contributions that make the work stand out.

Federated Learning
Cross-View Video Understanding
Non-IID Data
Contrastive Alignment
Communication Efficiency
🔎 Similar Papers
No similar papers found.
S
Shenghan Zhang
Software College, Northeastern University, Shenyang, China
R
Run Ling
Software College, Northeastern University, Shenyang, China
Ke Cao
Ke Cao
University of Science and Technology of China
low level visionvideo generation
Ao Ma
Ao Ma
JD.com
Generative AIVideo Generation
Zhanjie Zhang
Zhanjie Zhang
Zhejiang University
computer vision