CADET: Context-Conditioned Ads CTR Prediction With a Decoder-Only Transformer

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in industrial click-through rate (CTR) prediction—namely, post-scoring context modeling, online-offline inconsistency, and scalability—by proposing an end-to-end decoder-only Transformer architecture. The model explicitly captures post-scoring signals such as ad position through a context-conditioned multi-tower prediction head and integrates several technical innovations to balance expressiveness and inference efficiency: self-gated attention, timestamp-based RoPE positional encoding, session-aware masking, tensor packing, sequence chunking, and a customized FlashAttention kernel. Online A/B tests on LinkedIn’s homepage feed advertising system demonstrate an 11.04% CTR improvement over the LiRank baseline, leading to its successful deployment as the primary serving model handling the main traffic load.

Technology Category

Application Category

📝 Abstract
Click-through rate (CTR) prediction is fundamental to online advertising systems. While Deep Learning Recommendation Models (DLRMs) with explicit feature interactions have long dominated this domain, recent advances in generative recommenders have shown promising results in content recommendation. However, adapting these transformer-based architectures to ads CTR prediction still presents unique challenges, including handling post-scoring contextual signals, maintaining offline-online consistency, and scaling to industrial workloads. We present CADET (Context-Conditioned Ads Decoder-Only Transformer), an end-to-end decoder-only transformer for ads CTR prediction deployed at LinkedIn. Our approach introduces several key innovations: (1) a context-conditioned decoding architecture with multi-tower prediction heads that explicitly model post-scoring signals such as ad position, resolving the chicken-and-egg problem between predicted CTR and ranking; (2) a self-gated attention mechanism that stabilizes training by adaptively regulating information flow at both representation and interaction levels; (3) a timestamp-based variant of Rotary Position Embedding (RoPE) that captures temporal relationships across timescales from seconds to months; (4) session masking strategies that prevent the model from learning dependencies on unavailable in-session events, addressing train-serve skew; and (5) production engineering techniques including tensor packing, sequence chunking, and custom Flash Attention kernels that enable efficient training and serving at scale. In online A/B testing, CADET achieves a 11.04\% CTR lift compared to the production LiRank baseline model, a hybrid ensemble of DCNv2 and sequential encoders. The system has been successfully deployed on LinkedIn's advertising platform, serving the main traffic for homefeed sponsored updates.
Problem

Research questions and friction points this paper is trying to address.

CTR prediction
contextual signals
offline-online consistency
industrial-scale recommendation
post-scoring features
Innovation

Methods, ideas, or system contributions that make the work stand out.

decoder-only transformer
context-conditioned CTR prediction
self-gated attention
timestamp-based RoPE
train-serve skew mitigation
🔎 Similar Papers
No similar papers found.
D
David Pardoe
LinkedIn, Mountain View, California, USA
N
Neil Daftary
LinkedIn, Mountain View, California, USA
M
Miro Furtado
LinkedIn, Mountain View, California, USA
A
Aditya Aiyer
LinkedIn, Mountain View, California, USA
Y
Yu Wang
LinkedIn, Mountain View, California, USA
Liuqing Li
Liuqing Li
Virginia Tech
Digital LibraryInformation RetrievalSocial Media
T
Tao Song
LinkedIn, Mountain View, California, USA
Lars Hertel
Lars Hertel
LinkedIn
Hyperparameter OptimizationBayesian Optimization
Y
Young Jin Yun
LinkedIn, Mountain View, California, USA
S
Senthil Radhakrishnan
LinkedIn, Mountain View, California, USA
Z
Zhiwei Wang
LinkedIn, Mountain View, California, USA
Tommy Li
Tommy Li
Freie Universität Berlin
K
Khai Tran
LinkedIn, Mountain View, California, USA
A
Ananth Nagarajan
LinkedIn, Mountain View, California, USA
A
Ali Naqvi
LinkedIn, Mountain View, California, USA
Yue Zhang
Yue Zhang
Meta
Cyber Physical SystemMultimodal SensingMachine Learning
R
Renpeng Fang
LinkedIn, Mountain View, California, USA
A
Avi Romascanu
LinkedIn, Mountain View, California, USA
A
Arjun Kulothungun
LinkedIn, Mountain View, California, USA
D
Deepak Kumar
LinkedIn, Mountain View, California, USA
P
Praneeth Boda
LinkedIn, Mountain View, California, USA
Fedor Borisyuk
Fedor Borisyuk
LinkedIn
Machine learning
R
Ruoyan Wang
LinkedIn, Mountain View, California, USA