🤖 AI Summary
To address the challenge of collaborative learning among heterogeneous edge devices in multimodal sensing scenarios, this paper proposes Sheaf-DMFL—a decentralized framework that pioneers the integration of sheaf theory into distributed multimodal learning. Sheaf-DMFL models intrinsic inter-client task-layer dependencies via sheaf theory, and incorporates local feature encoders, task-specific layers, and cross-modal attention mechanisms to enable cooperative training across devices with disparate modalities and model architectures. The framework provides rigorous theoretical convergence guarantees. Evaluated on real-world tasks—link congestion prediction and millimeter-wave beamforming—Sheaf-DMFL significantly outperforms conventional federated learning baselines. Experimental results demonstrate its effectiveness, robustness, and generalizability in next-generation wireless communication systems.
📝 Abstract
In large-scale communication systems, increasingly complex scenarios require more intelligent collaboration among edge devices collecting various multimodal sensory data to achieve a more comprehensive understanding of the environment and improve decision-making accuracy. However, conventional federated learning (FL) algorithms typically consider unimodal datasets, require identical model architectures, and fail to leverage the rich information embedded in multimodal data, limiting their applicability to real-world scenarios with diverse modalities and varying client capabilities. To address this issue, we propose Sheaf-DMFL, a novel decentralized multimodal learning framework leveraging sheaf theory to enhance collaboration among devices with diverse modalities. Specifically, each client has a set of local feature encoders for its different modalities, whose outputs are concatenated before passing through a task-specific layer. While encoders for the same modality are trained collaboratively across clients, we capture the intrinsic correlations among clients' task-specific layers using a sheaf-based structure. To further enhance learning capability, we propose an enhanced algorithm named Sheaf-DMFL-Att, which tailors the attention mechanism within each client to capture correlations among different modalities. A rigorous convergence analysis of Sheaf-DMFL-Att is provided, establishing its theoretical guarantees. Extensive simulations are conducted on real-world link blockage prediction and mmWave beamforming scenarios, demonstrate the superiority of the proposed algorithms in such heterogeneous wireless communication systems.