🤖 AI Summary
This work addresses three key challenges in federated cross-view video understanding: non-IID data caused by view heterogeneity, inconsistent semantic representations across views, and high communication overhead. To tackle these issues, the authors propose FedCVU, a novel framework that integrates view-specific normalization (VS-Norm), a lightweight cross-view contrastive alignment module (CV-Align), and a selective layer aggregation strategy (SLA) to enable efficient, robust, and privacy-preserving multi-view collaborative learning. Experimental results demonstrate that FedCVU significantly improves accuracy on unseen views in both cross-view action recognition and person re-identification tasks, outperforming existing federated approaches while exhibiting strong robustness to domain shifts and communication constraints.
📝 Abstract
Federated learning (FL) has emerged as a promising paradigm for privacy-preserving multi-camera video understanding. However, applying FL to cross-view scenarios faces three major challenges: (i) heterogeneous viewpoints and backgrounds lead to highly non-IID client distributions and overfitting to view-specific patterns, (ii) local distribution biases cause misaligned representations that hinder consistent cross-view semantics, and (iii) large video architectures incur prohibitive communication overhead. To address these issues, we propose FedCVU, a federated framework with three components: VS-Norm, which preserves normalization parameters to handle view-specific statistics; CV-Align, a lightweight contrastive regularization module to improve cross-view representation alignment; and SLA, a selective layer aggregation strategy that reduces communication without sacrificing accuracy. Extensive experiments on action understanding and person re-identification tasks under a cross-view protocol demonstrate that FedCVU consistently boosts unseen-view accuracy while maintaining strong seen-view performance, outperforming state-of-the-art FL baselines and showing robustness to domain heterogeneity and communication constraints.