Distribution-Aware End-to-End Embedding for Streaming Numerical Features in Click-Through Rate Prediction

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional numerical feature embedding in streaming click-through rate (CTR) prediction, where offline statistics often induce semantic drift, and existing end-to-end approaches overlook the dynamic nature of feature distributions and field-wise contextual dependencies. To this end, we propose DAES, a novel framework that, for the first time, integrates dynamic distribution modeling and field-aware semantics within an end-to-end streaming training paradigm. DAES leverages reservoir sampling for efficient online estimation of evolving feature distributions and introduces two field-aware modulation strategies to adaptively refine embeddings. Extensive experiments on large-scale offline and online evaluations demonstrate that DAES significantly outperforms state-of-the-art methods, and it has been fully deployed on a short-video platform serving hundreds of millions of daily active users.

Technology Category

Application Category

📝 Abstract
This paper explores effective numerical feature embedding for Click-Through Rate prediction in streaming environments. Conventional static binning methods rely on offline statistics of numerical distributions; however, this inherently two-stage process often triggers semantic drift during bin boundary updates. While neural embedding methods enable end-to-end learning, they often discard explicit distributional information. Integrating such information end-to-end is challenging because streaming features often violate the i.i.d. assumption, precluding unbiased estimation of the population distribution via the expectation of order statistics. Furthermore, the critical context dependency of numerical distributions is often neglected. To this end, we propose DAES, an end-to-end framework designed to tackle numerical feature embedding in streaming training scenarios by integrating distributional information with an adaptive modulation mechanism. Specifically, we introduce an efficient reservoir-sampling-based distribution estimation method and two field-aware distribution modulation strategies to capture streaming distributions and field-dependent semantics. DAES significantly outperforms existing approaches as demonstrated by extensive offline and online experiments and has been fully deployed on a leading short-video platform with hundreds of millions of daily active users.
Problem

Research questions and friction points this paper is trying to address.

numerical feature embedding
click-through rate prediction
streaming environment
distribution awareness
semantic drift
Innovation

Methods, ideas, or system contributions that make the work stand out.

distribution-aware embedding
streaming numerical features
reservoir sampling
field-aware modulation
end-to-end CTR prediction
🔎 Similar Papers
No similar papers found.
J
Jiahao Liu
Fudan University
H
Hongji Ruan
Beijing Jiaotong University
W
Weimin Zhang
Tencent
Z
Ziye Tong
Tencent
D
Derick Tang
Tencent
Zhanpeng Zeng
Zhanpeng Zeng
University of Wisconsin Madison
Transformer Efficiency
Q
Qinsong Zeng
Tencent
P
Peng Zhang
Fudan University
T
Tun Lu
Fudan University
Ning Gu
Ning Gu
Fudan University
Collaborative ComputingCSCWSocial ComputingHuman Computer InteractionRecommendation