Efficient Cross-Architecture Knowledge Transfer for Large-Scale Online User Response Prediction

πŸ“… 2026-02-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the performance degradation in large-scale user response prediction systems during model architecture transitions, caused by high retraining costs and data retention constraints. To this end, we propose CrossAdapt, a two-stage transfer framework. In the offline phase, it enables rapid embedding migration via dimension-adaptive projection and reduces computational overhead through progressive network distillation and strategic sampling. In the online phase, it employs asymmetric co-distillation and a distribution-aware adaptation mechanism to balance historical knowledge preservation with rapid adaptation to new data. Notably, CrossAdapt introduces the first non-iterative embedding transfer strategy, effectively tackling the challenges of heterogeneous architectures and massive embedding table migration. Experiments show AUC improvements of 0.27–0.43% and training time reductions of 43–71% across three public datasets, while significantly mitigating AUC drops, LogLoss increases, and prediction bias in WeChat Channels’ tens-of-millions daily active user scenario.

Technology Category

Application Category

πŸ“ Abstract
Deploying new architectures in large-scale user response prediction systems incurs high model switching costs due to expensive retraining on massive historical data and performance degradation under data retention constraints. Existing knowledge distillation methods struggle with architectural heterogeneity and the prohibitive cost of transferring large embedding tables. We propose CrossAdapt, a two-stage framework for efficient cross-architecture knowledge transfer. The offline stage enables rapid embedding transfer via dimension-adaptive projections without iterative training, combined with progressive network distillation and strategic sampling to reduce computational cost. The online stage introduces asymmetric co-distillation, where students update frequently while teachers update infrequently, together with a distribution-aware adaptation mechanism that dynamically balances historical knowledge preservation and fast adaptation to evolving data. Experiments on three public datasets show that CrossAdapt achieves 0.27-0.43% AUC improvements while reducing training time by 43-71%. Large-scale deployment on Tencent WeChat Channels (~10M daily samples) further demonstrates its effectiveness, significantly mitigating AUC degradation, LogLoss increase, and prediction bias compared to standard distillation baselines.
Problem

Research questions and friction points this paper is trying to address.

cross-architecture knowledge transfer
user response prediction
model switching cost
embedding transfer
large-scale deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-architecture knowledge transfer
embedding transfer
asymmetric co-distillation
distribution-aware adaptation
large-scale user response prediction
πŸ”Ž Similar Papers
No similar papers found.
Y
Yucheng Wu
Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education & School of Computer Science, Peking University
Yuekui Yang
Yuekui Yang
NASA Goddard Space Flight Center
Radiative transferRemote sensing
H
Hongzheng Li
Advertising Engineering Department, CDG, Tencent Corporation
A
Anan Liu
Advertising Engineering Department, CDG, Tencent Corporation
Jian Xiao
Jian Xiao
School of Computer Science and Information Engineering, Hefei University of Technology
multimodalvision and languagetext-video retrieval
J
Junjie Zhai
Advertising Engineering Department, CDG, Tencent Corporation
H
Huan Yu
Advertising Engineering Department, CDG, Tencent Corporation
S
Shaoping Ma
Department of Computer Science and Technology, Tsinghua University
Leye Wang
Leye Wang
Tenured Associate Professor, Peking University
Ubiquitous ComputingUrban ComputingCrowdsensingFederated Learning