Towards An Efficient LLM Training Paradigm for CTR Prediction

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low training efficiency of large language models (LLMs) in click-through rate (CTR) prediction—particularly under sliding-window paradigms that incur O(mn²) computational complexity—this paper proposes Dynamic Target Isolation (DTI), a novel training paradigm. DTI is the first approach to jointly identify and mitigate two critical bottlenecks: hidden-state leakage and overfitting to positional bias. It achieves this through three key techniques: sequence parallelization, context-isolating masking, and lightweight position decoupling, enabling efficient parallel training over k target interactions. Evaluated on three public CTR benchmarks, DTI reduces average training time by 92% (e.g., from 70.5 to 5.31 hours) without compromising prediction accuracy. The method significantly enhances the scalability and practical applicability of LLMs for industrial-scale CTR prediction tasks.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated tremendous potential as the next-generation ranking-based recommendation system. Many recent works have shown that LLMs can significantly outperform conventional click-through-rate (CTR) prediction approaches. Despite such promising results, the computational inefficiency inherent in the current training paradigm makes it particularly challenging to train LLMs for ranking-based recommendation tasks on large datasets. To train LLMs for CTR prediction, most existing studies adopt the prevalent ''sliding-window'' paradigm. Given a sequence of $m$ user interactions, a unique training prompt is constructed for each interaction by designating it as the prediction target along with its preceding $n$ interactions serving as context. In turn, the sliding-window paradigm results in an overall complexity of $O(mn^2)$ that scales linearly with the length of user interactions. Consequently, a direct adoption to train LLMs with such strategy can result in prohibitively high training costs as the length of interactions grows. To alleviate the computational inefficiency, we propose a novel training paradigm, namely Dynamic Target Isolation (DTI), that structurally parallelizes the training of $k$ (where $k>>1$) target interactions. Furthermore, we identify two major bottlenecks - hidden-state leakage and positional bias overfitting - that limit DTI to only scale up to a small value of $k$ (e.g., 5) then propose a computationally light solution to effectively tackle each. Through extensive experiments on three widely adopted public CTR datasets, we empirically show that DTI reduces training time by an average of $ extbf{92%}$ (e.g., from $70.5$ hrs to $5.31$ hrs), without compromising CTR prediction performance.
Problem

Research questions and friction points this paper is trying to address.

Addresses computational inefficiency in LLM training for CTR prediction.
Proposes Dynamic Target Isolation to parallelize training of multiple interactions.
Solves hidden-state leakage and positional bias to enhance scalability.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Target Isolation (DTI) for parallel training
Addresses hidden-state leakage and positional bias overfitting
Reduces training time by 92% without performance loss
🔎 Similar Papers
No similar papers found.