🤖 AI Summary
This work addresses the challenge of gradient misalignment in decentralized federated learning caused by heterogeneous agents—ranging from unimodal to multimodal—when using unified embeddings, which impedes effective collaboration. To overcome this, the authors propose PARSE, a novel framework that introduces Partial Information Decomposition (PID) into this setting for the first time. PARSE decouples latent representations via feature fission into redundant, unique, and synergistic components and employs a slice-level partial alignment mechanism that exchanges only semantically shareable modality-specific branches. This enables peer-to-peer knowledge sharing without centralized coordination or gradient surgery. Extensive experiments demonstrate that PARSE significantly outperforms task-level, modality-level, and hybrid sharing baselines across diverse benchmarks and agent configurations. Ablation studies and visualizations further confirm its effectiveness and robustness.
📝 Abstract
Multimodal decentralized federated learning (DFL) is challenging because agents differ in available modalities and model architectures, yet must collaborate over peer-to-peer (P2P) networks without a central coordinator. Standard multimodal pipelines learn a single shared embedding across all modalities. In DFL, such a monolithic representation induces gradient misalignment between uni- and multimodal agents; as a result, it suppresses heterogeneous sharing and cross-modal interaction. We present PARSE, a multimodal DFL framework that operationalizes partial information decomposition (PID) in a server-free setting. Each agent performs feature fission to factorize its latent representation into redundant, unique, and synergistic slices. P2P knowledge sharing among heterogeneous agents is enabled by slice-level partial alignment: only semantically shareable branches are exchanged among agents that possess the corresponding modality. By removing the need for central coordination and gradient surgery, PARSE resolves uni-/multimodal gradient conflicts, thereby overcoming the multimodal DFL dilemma while remaining compatible with standard DFL constraints. Across benchmarks and agent mixes, PARSE yields consistent gains over task-, modality-, and hybrid-sharing DFL baselines. Ablations on fusion operators and split ratios, together with qualitative visualizations, further demonstrate the efficiency and robustness of the proposed design.