Group Cognition Learning: Making Everything Better Through Governed Two-Stage Agents Collaboration

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This work addresses the challenges of modality dominance and spurious coupling in multimodal learning by proposing a Group Cognitive Learning framework, which introduces a novel two-stage fusion paradigm governed collaboratively by multiple agents. In the first stage, routing and auditing agents enable selective cross-modal interactions; in the second stage, consensus predictions are generated through shared latent factors and an aggregation agent, while preserving modality-specific representations. The method explicitly models shared factors and integrates sample-level gating with contribution-aware weighting to dynamically regulate modality interactions and fusion weights. Evaluated on CMU-MOSI, CMU-MOSEI, and MIntRec benchmarks, the approach achieves state-of-the-art performance in both regression and classification tasks, significantly enhancing the robustness and effectiveness of multimodal learning.
📝 Abstract
Centralized multimodal learning commonly compresses language, acoustic, and visual signals into a single fused representation for prediction. While effective, this paradigm suffers from two limitations: modality dominance, where optimization gravitates towards the path of least resistance, ignoring weaker but informative modalities, and spurious modality coupling, where models overfit to incidental cross-modal correlations. To address these, we propose Group Cognition Learning (GCL), a governed collaboration paradigm that applies a two-stage protocol after modality-specific encoding. In Stage 1 (Selective Interaction), a Routing Agent proposes directed interaction routes, and an Auditing Agent assigns sample-wise gates to emphasize exchanges that yield positive marginal predictive gain while suppressing redundant coupling. In Stage 2 (Consensus Formation), a Public-Factor Agent maintains an explicit shared factor, and an Aggregation Agent produces the final prediction through contribution-aware weighting while keeping each modality representation as a specialization channel. Extensive experiments on CMU-MOSI, CMU-MOSEI, and MIntRec demonstrate that GCL mitigates dominance and coupling, establishing state-of-the-art results across both regression and classification benchmarks. Analysis experiments further demonstrate the effectiveness of the design.
Problem

Research questions and friction points this paper is trying to address.

modality dominance
spurious modality coupling
multimodal learning
cross-modal correlation
fused representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group Cognition Learning
Modality Dominance
Spurious Modality Coupling
Two-Stage Agent Collaboration
Multimodal Fusion