🤖 AI Summary
This work addresses a critical limitation in existing generative CTR prediction methods, which assign uniform reconstruction weights across all feature fields, thereby overlooking their inherent differences in reconstruction difficulty. This uniformity causes easily learned fields to dominate training while underfitting more challenging yet informative ones. To remedy this, the authors propose a self-balancing learning framework that introduces learnable field-level difficulty parameters and jointly designs a self-balancing loss function with a difficulty-guided attention mechanism. These components share learning signals to dynamically reallocate gradient resources—suppressing well-converged fields and enhancing cross-field information flow for difficult ones—without requiring additional hyperparameters. Evaluated through discrete diffusion-based generative pretraining and end-to-end joint optimization, the method significantly outperforms state-of-the-art approaches across five benchmark datasets and seven-day online A/B tests, with particularly notable gains in cold-start and long-tail user scenarios.
📝 Abstract
Generative pre-training via discrete diffusion provides dense reconstruction supervision across all feature fields simultaneously, mitigating representation collapse from data sparsity in CTR prediction. However, all existing generative CTR methods share a fundamental limitation: the reconstruction objective assigns equal training weight to every feature field, ignoring the profound heterogeneity of reconstruction difficulty across high-cardinality ID fields, sparse categorical attributes, numerical values, and behavioral sequences. This causes easy fields to dominate training gradients while the hardest but most informative fields remain chronically underfit, a problem we term the generative difficulty imbalance.We propose HeteGenCTR, which resolves this imbalance through per-field learnable difficulty parameters jointly trained with the denoising network. This unified signal drives two coordinated components without additional hyperparameters: a self-balancing loss that automatically reallocates gradient budget toward harder fields with a provably stable equilibrium, and a difficulty-guided attention mechanism that suppresses the influence of already-converged easy fields while amplifying cross-field information flow toward hard fields. Both components share the same learned signal and remain mutually consistent throughout training. Experiments on five CTR benchmarks and a seven-day online A/B test demonstrate consistent, statistically significant improvements over state-of-the-art baselines, with disproportionate gains for cold-start and long-tail users.