🤖 AI Summary
To address the challenges of modeling temporal dynamics, heterogeneous feature distribution shifts, and severe noise interference in multimodal sequential recommendation, this paper proposes a unified information disentanglement framework. Methodologically: (1) a Stein kernel-driven ensemble coordination module aligns distributions between multimodal features and ID embeddings; (2) a cross-modal expert routing mechanism adaptively selects and fuses context-aware features; (3) a hybrid architecture integrates multi-head subspace decomposition, RBF-based Stein gradient estimation, linear-complexity Mamba structures, and an information-flow-controlled output paradigm to balance modeling efficiency and stability. Evaluated on three real-world datasets, the model significantly outperforms state-of-the-art methods—particularly under long-sequence and high-noise conditions—demonstrating superior robustness, improved recommendation accuracy, and enhanced interpretability.
📝 Abstract
Modern recommendation systems face significant challenges in processing multimodal sequential data, particularly in temporal dynamics modeling and information flow coordination. Traditional approaches struggle with distribution discrepancies between heterogeneous features and noise interference in multimodal signals. We propose extbf{FindRec}~ ( extbf{F}lexible unified extbf{in}formation extbf{d}isentanglement for multi-modal sequential extbf{Rec}ommendation), introducing a novel "information flow-control-output" paradigm. The framework features two key innovations: (1) A Stein kernel-based Integrated Information Coordination Module (IICM) that theoretically guarantees distribution consistency between multimodal features and ID streams, and (2) A cross-modal expert routing mechanism that adaptively filters and combines multimodal features based on their contextual relevance. Our approach leverages multi-head subspace decomposition for routing stability and RBF-Stein gradient for unbiased distribution alignment, enhanced by linear-complexity Mamba layers for efficient temporal modeling. Extensive experiments on three real-world datasets demonstrate FindRec's superior performance over state-of-the-art baselines, particularly in handling long sequences and noisy multimodal inputs. Our framework achieves both improved recommendation accuracy and enhanced model interpretability through its modular design. The implementation code is available anonymously online for easy reproducibility~footnote{https://github.com/Applied-Machine-Learning-Lab/FindRec}.