Enhancing CTR Prediction with De-correlated Expert Networks

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the performance bottleneck in CTR estimation caused by expert homogenization in Mixture-of-Experts (MoE) models, this paper proposes a decorrelated MoE framework that explicitly regularizes output correlations among experts to enhance modeling diversity. Methodologically, it introduces: (1) a cross-expert decorrelation loss function; (2) a quantifiable “cross-expert correlation degree” metric; and (3) empirical validation that multi-strategy collaboration—e.g., multi-embedding tables combined with heterogeneous experts—progressively reduces expert correlation while consistently improving CTR performance. Offline experiments reveal a strong positive correlation between expert correlation degree and CTR evaluation metrics. Online A/B tests on Tencent’s advertising platform demonstrate a 1.19% GMV lift, significantly outperforming the multi-embedding MoE baseline.

Technology Category

Application Category

📝 Abstract
Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert architectures, to differentiate the experts, which we refer to expert emph{de-correlation}. However, it remains unclear whether these strategies effectively achieve de-correlated experts. To address this, we propose a De-Correlated MoE (D-MoE) framework, which introduces a Cross-Expert De-Correlation loss to minimize expert correlations.Additionally, we propose a novel metric, termed Cross-Expert Correlation, to quantitatively evaluate the expert de-correlation degree. Based on this metric, we identify a key finding for MoE framework design: emph{different de-correlation strategies are mutually compatible, and progressively employing them leads to reduced correlation and enhanced performance}.Extensive experiments have been conducted to validate the effectiveness of D-MoE and the de-correlation principle. Moreover, online A/B testing on Tencent's advertising platforms demonstrates that D-MoE achieves a significant 1.19% Gross Merchandise Volume (GMV) lift compared to the Multi-Embedding MoE baseline.
Problem

Research questions and friction points this paper is trying to address.

Improving CTR prediction via de-correlated expert networks
Measuring expert de-correlation with a novel metric
Validating performance gains through experiments and A/B testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

De-Correlated MoE framework minimizes expert correlations
Cross-Expert De-Correlation loss reduces expert dependencies
Progressively combining strategies enhances de-correlation performance
🔎 Similar Papers
No similar papers found.
J
Jiancheng Wang
University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence
Mingjia Yin
Mingjia Yin
University of Science and Technology of China
Recommender systemData-centric AI
Junwei Pan
Junwei Pan
Tencent, Yahoo Research
Computational AdvertisingRecommendation SystemDeep Learning
Ximei Wang
Ximei Wang
Tencent Inc.
H
Hao Wang
University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning