🤖 AI Summary
This work addresses the limitations of traditional numerical feature embedding in streaming click-through rate (CTR) prediction, where offline statistics often induce semantic drift, and existing end-to-end approaches overlook the dynamic nature of feature distributions and field-wise contextual dependencies. To this end, we propose DAES, a novel framework that, for the first time, integrates dynamic distribution modeling and field-aware semantics within an end-to-end streaming training paradigm. DAES leverages reservoir sampling for efficient online estimation of evolving feature distributions and introduces two field-aware modulation strategies to adaptively refine embeddings. Extensive experiments on large-scale offline and online evaluations demonstrate that DAES significantly outperforms state-of-the-art methods, and it has been fully deployed on a short-video platform serving hundreds of millions of daily active users.
📝 Abstract
This paper explores effective numerical feature embedding for Click-Through Rate prediction in streaming environments. Conventional static binning methods rely on offline statistics of numerical distributions; however, this inherently two-stage process often triggers semantic drift during bin boundary updates. While neural embedding methods enable end-to-end learning, they often discard explicit distributional information. Integrating such information end-to-end is challenging because streaming features often violate the i.i.d. assumption, precluding unbiased estimation of the population distribution via the expectation of order statistics. Furthermore, the critical context dependency of numerical distributions is often neglected. To this end, we propose DAES, an end-to-end framework designed to tackle numerical feature embedding in streaming training scenarios by integrating distributional information with an adaptive modulation mechanism. Specifically, we introduce an efficient reservoir-sampling-based distribution estimation method and two field-aware distribution modulation strategies to capture streaming distributions and field-dependent semantics. DAES significantly outperforms existing approaches as demonstrated by extensive offline and online experiments and has been fully deployed on a leading short-video platform with hundreds of millions of daily active users.