From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing CTR prediction models predominantly rely on explicit feature interactions over ID embeddings, often leading to embedding dimension collapse and information redundancy. To address this, we propose the Supervised Feature Generation (SFG) framework—the first to shift CTR modeling from discriminative *feature interaction* to generative *feature generation*. SFG employs an encoder-decoder architecture to implicitly capture high-order feature relationships within the ID embedding latent space, using click labels as supervision signals to guide feature reconstruction. A novel supervised reconstruction loss is introduced to significantly enhance feature discriminability. The framework is plug-and-play and seamlessly integrates with mainstream models—including DeepFM, DCN, and xDeepFM—without architectural modification. Extensive experiments on benchmark datasets (Criteo, Ali-CCP) demonstrate consistent AUC improvements of 0.5–1.2%. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Click-Through Rate (CTR) prediction, a core task in recommendation systems, aims to estimate the probability of users clicking on items. Existing models predominantly follow a discriminative paradigm, which relies heavily on explicit interactions between raw ID embeddings. However, this paradigm inherently renders them susceptible to two critical issues: embedding dimensional collapse and information redundancy, stemming from the over-reliance on feature interactions emph{over raw ID embeddings}. To address these limitations, we propose a novel emph{Supervised Feature Generation (SFG)} framework, emph{shifting the paradigm from discriminative ``feature interaction" to generative ``feature generation"}. Specifically, SFG comprises two key components: an emph{Encoder} that constructs hidden embeddings for each feature, and a emph{Decoder} tasked with regenerating the feature embeddings of all features from these hidden representations. Unlike existing generative approaches that adopt self-supervised losses, we introduce a supervised loss to utilize the supervised signal, ie, click or not, in the CTR prediction task. This framework exhibits strong generalizability: it can be seamlessly integrated with most existing CTR models, reformulating them under the generative paradigm. Extensive experiments demonstrate that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains across various datasets and base models. The code is available at https://github.com/USTC-StarTeam/GE4Rec.
Problem

Research questions and friction points this paper is trying to address.

Addresses embedding collapse in CTR models
Reduces information redundancy from feature interactions
Shifts paradigm from discriminative to generative feature generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative feature generation replaces discriminative feature interaction
Supervised loss utilizes click signals for feature regeneration
Encoder-decoder framework integrates with existing CTR models
🔎 Similar Papers
No similar papers found.
Mingjia Yin
Mingjia Yin
University of Science and Technology of China
Recommender systemData-centric AI
Junwei Pan
Junwei Pan
Tencent, Yahoo Research
Computational AdvertisingRecommendation SystemDeep Learning
H
Hao Wang
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei, China
Ximei Wang
Ximei Wang
Tencent Inc, China
S
Shangyu Zhang
Tencent Inc, China
J
Jie Jiang
Tencent Inc, China
D
Defu Lian
State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China, Hefei, China
Enhong Chen
Enhong Chen
University of Science and Technology of China
data miningrecommender systemmachine learning