🤖 AI Summary
Existing single-network approaches struggle to capture the diversity of user behavior, while conventional ensemble methods suffer from high training costs and sensitivity to noise. To address these limitations, this work proposes FLAME, a modular ensemble framework that efficiently compresses exponential ensemble diversity into a single network architecture. FLAME introduces frozen networks as semantic anchors during training and enables guided mutual learning between these anchors and trainable networks. This design achieves stable and effective ensemble performance without incurring additional inference overhead. Extensive experiments demonstrate that FLAME consistently outperforms state-of-the-art methods across six benchmark datasets, yielding up to a 9.70% relative improvement in NDCG@20 and accelerating convergence by up to 7.69×.
📝 Abstract
Sequential recommendation requires capturing diverse user behaviors, which a single network often fails to capture. While ensemble methods mitigate this by leveraging multiple networks, training them all from scratch leads to high computational cost and instability from noisy mutual supervision. We propose {\bf F}rozen and {\bf L}earnable networks with {\bf A}ligned {\bf M}odular {\bf E}nsemble ({\bf FLAME}), a novel framework that condenses ensemble-level diversity into a single network for efficient sequential recommendation. During training, FLAME simulates exponential diversity using only two networks via {\it modular ensemble}. By decomposing each network into sub-modules (e.g., layers or blocks) and dynamically combining them, FLAME generates a rich space of diverse representation patterns. To stabilize this process, we pretrain and freeze one network to serve as a semantic anchor and employ {\it guided mutual learning}. This aligns the diverse representations into the space of the remaining learnable network, ensuring robust optimization. Consequently, at inference, FLAME utilizes only the learnable network, achieving ensemble-level performance with zero overhead compared to a single network. Experiments on six datasets show that FLAME outperforms state-of-the-art baselines, achieving up to 7.69$\times$ faster convergence and 9.70\% improvement in NDCG@20. We provide the source code of FLAME at https://github.com/woo-joo/FLAME_SIGIR26.