SDG-MoE: Signed Debate Graph Mixture-of-Experts

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

221K/year
🤖 AI Summary
This work addresses a key limitation in existing sparse Mixture-of-Experts (MoE) models, where selected experts process inputs independently without effective collaboration, thereby constraining performance gains. To overcome this, the authors propose a lightweight iterative deliberation mechanism that introduces a signed debate graph after routing, explicitly modeling supportive and critical interactions among experts to enable structured reinforcement and correction. Coupled with a disagreement-aware gating anchoring strategy, the approach preserves expert specialization while fostering collaborative refinement. Theoretical analysis demonstrates that the mechanism ensures state stability and incurs minimal computational overhead. Experimental results on WikiText-103, C4, and Paloma show consistent and significant improvements over standard MoE and unsigned graph baselines, achieving up to a 19.8% reduction in perplexity.
📝 Abstract
Sparse MoE models achieve a good balance between capacity and compute by routing each token to a small subset of experts. However, in most MoE architectures, once a token is routed, the selected experts process it independently and their outputs are combined via a weighted sum. This leaves open whether enabling communication among them could improve performance. While prior work has raised this question, direct interaction among the active routed experts remains underexplored. In this paper, we propose SDG-MoE (Signed Debate Graph Mixture-of-Experts), a novel architecture that adds a lightweight, iterative deliberation step before final aggregation. SDG-MoE introduces three components: (i) two learned interaction matrices over the active experts, a support graph $A^+$ and a critique graph $A^-$, capturing reinforcing and corrective influences; (ii) a signed message-passing step that updates expert representations before aggregation; and (iii) a disagreement-gated Friedkin-Johnsen-style anchoring that controls deliberation strength while preventing expert drift. Together, these enable a structured deliberation process where interaction strength scales with disagreement and specialization is preserved. We also provide a theoretical analysis establishing stability conditions on expert states and showing that deliberation adds only low-order overhead over the active set. In controlled three-seed pretraining experiments, SDG-MoE improves validation perplexity over both an unsigned graph communication baseline and vanilla MoE, outperforming the strongest baseline by 19.8%, and gives the best external perplexity on WikiText-103, C4, and Paloma among the compared systems.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
expert communication
sparse MoE
token routing
model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
Signed Graph
Expert Deliberation
Message Passing
Sparse MoE
🔎 Similar Papers
No similar papers found.