Enhancing CTR Prediction with De-correlated Expert Networks

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the performance bottleneck in CTR estimation caused by expert homogenization in Mixture-of-Experts (MoE) models, this paper proposes a decorrelated MoE framework that explicitly regularizes output correlations among experts to enhance modeling diversity. Methodologically, it introduces: (1) a cross-expert decorrelation loss function; (2) a quantifiable “cross-expert correlation degree” metric; and (3) empirical validation that multi-strategy collaboration—e.g., multi-embedding tables combined with heterogeneous experts—progressively reduces expert correlation while consistently improving CTR performance. Offline experiments reveal a strong positive correlation between expert correlation degree and CTR evaluation metrics. Online A/B tests on Tencent’s advertising platform demonstrate a 1.19% GMV lift, significantly outperforming the multi-embedding MoE baseline.

Technology Category

Application Category

📝 Abstract

Modeling feature interactions is essential for accurate click-through rate (CTR) prediction in advertising systems. Recent studies have adopted the Mixture-of-Experts (MoE) approach to improve performance by ensembling multiple feature interaction experts. These studies employ various strategies, such as learning independent embedding tables for each expert or utilizing heterogeneous expert architectures, to differentiate the experts, which we refer to expert emph{de-correlation}. However, it remains unclear whether these strategies effectively achieve de-correlated experts. To address this, we propose a De-Correlated MoE (D-MoE) framework, which introduces a Cross-Expert De-Correlation loss to minimize expert correlations.Additionally, we propose a novel metric, termed Cross-Expert Correlation, to quantitatively evaluate the expert de-correlation degree. Based on this metric, we identify a key finding for MoE framework design: emph{different de-correlation strategies are mutually compatible, and progressively employing them leads to reduced correlation and enhanced performance}.Extensive experiments have been conducted to validate the effectiveness of D-MoE and the de-correlation principle. Moreover, online A/B testing on Tencent's advertising platforms demonstrates that D-MoE achieves a significant 1.19% Gross Merchandise Volume (GMV) lift compared to the Multi-Embedding MoE baseline.

Problem

Research questions and friction points this paper is trying to address.

Improving CTR prediction via de-correlated expert networks

Measuring expert de-correlation with a novel metric

Validating performance gains through experiments and A/B testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

De-Correlated MoE framework minimizes expert correlations

Cross-Expert De-Correlation loss reduces expert dependencies

Progressively combining strategies enhances de-correlation performance

🔎 Similar Papers

No similar papers found.