Modality-Guided Mixture of Graph Experts with Entropy-Triggered Routing for Multimodal Recommendation

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of multimodal recommendation under sparse user feedback and long-tailed data distributions, where representation entanglement and modality imbalance often degrade performance. To mitigate these issues, the authors propose MAGNET, a novel framework that explicitly assigns modalities into three distinct roles—dominant, balanced, and complementary—via a modality-guided mixture-of-experts architecture over graphs. MAGNET employs an entropy-triggered two-stage routing mechanism to dynamically balance expert coverage and specialization, and integrates a dual-view graph learning module that fuses interaction graphs with content-induced edges. By combining interaction-conditioned routing and structure-aware augmentation, the framework achieves adaptive and interpretable multimodal fusion. Extensive experiments demonstrate that MAGNET significantly outperforms state-of-the-art methods across multiple benchmark datasets, with particularly notable gains in sparse and long-tailed scenarios.

Technology Category

Application Category

📝 Abstract
Multimodal recommendation enhances ranking by integrating user-item interactions with item content, which is particularly effective under sparse feedback and long-tail distributions. However, multimodal signals are inherently heterogeneous and can conflict in specific contexts, making effective fusion both crucial and challenging. Existing approaches often rely on shared fusion pathways, leading to entangled representations and modality imbalance. To address these issues, we propose \textbf{MAGNET}, a \textbf{M}odality-Guided Mixture of \textbf{A}daptive \textbf{G}raph Experts \textbf{N}etwork with Progressive \textbf{E}ntropy-\textbf{T}riggered Routing for Multimodal Recommendation, designed to enhance controllability, stability, and interpretability in multimodal fusion. MAGNET couples interaction-conditioned expert routing with structure-aware graph augmentation, so that both \emph{what} to fuse and \emph{how} to fuse are explicitly controlled and interpretable. At the representation level, a dual-view graph learning module augments the interaction graph with content-induced edges, improving coverage for sparse and long-tail items while preserving collaborative structure via parallel encoding and lightweight fusion. At the fusion level, MAGNET employs structured experts with explicit modality roles -- dominant, balanced, and complementary -- enabling a more interpretable and adaptive combination of behavioral, visual, and textual cues. To further stabilize sparse routing and prevent expert collapse, we introduce a two-stage entropy-weighting mechanism that monitors routing entropy. This mechanism automatically transitions training from an early coverage-oriented regime to a later specialization-oriented regime, progressively balancing expert utilization and routing confidence. Extensive experiments on public benchmarks demonstrate consistent improvements over strong baselines.
Problem

Research questions and friction points this paper is trying to address.

multimodal recommendation
modality heterogeneity
representation entanglement
modality imbalance
fusion interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-Guided Mixture of Experts
Entropy-Triggered Routing
Multimodal Recommendation
Graph Augmentation
Interpretable Fusion
🔎 Similar Papers
No similar papers found.
J
Ji Dai
Beijing University of Posts and Telecommunications, China
Quan Fang
Quan Fang
Ph.D. of Institute of Automation of the Chinese Academy Sciences (CASIA)
Knowledge Graph Data MiningMultimediaSocial Media
D
DeSheng Cai
Tianjin University of Technology, China