🤖 AI Summary
To address the low training efficiency of large language models (LLMs) in click-through rate (CTR) prediction—particularly under sliding-window paradigms that incur O(mn²) computational complexity—this paper proposes Dynamic Target Isolation (DTI), a novel training paradigm. DTI is the first approach to jointly identify and mitigate two critical bottlenecks: hidden-state leakage and overfitting to positional bias. It achieves this through three key techniques: sequence parallelization, context-isolating masking, and lightweight position decoupling, enabling efficient parallel training over k target interactions. Evaluated on three public CTR benchmarks, DTI reduces average training time by 92% (e.g., from 70.5 to 5.31 hours) without compromising prediction accuracy. The method significantly enhances the scalability and practical applicability of LLMs for industrial-scale CTR prediction tasks.
📝 Abstract
Large Language Models (LLMs) have demonstrated tremendous potential as the next-generation ranking-based recommendation system. Many recent works have shown that LLMs can significantly outperform conventional click-through-rate (CTR) prediction approaches. Despite such promising results, the computational inefficiency inherent in the current training paradigm makes it particularly challenging to train LLMs for ranking-based recommendation tasks on large datasets. To train LLMs for CTR prediction, most existing studies adopt the prevalent ''sliding-window'' paradigm. Given a sequence of $m$ user interactions, a unique training prompt is constructed for each interaction by designating it as the prediction target along with its preceding $n$ interactions serving as context. In turn, the sliding-window paradigm results in an overall complexity of $O(mn^2)$ that scales linearly with the length of user interactions. Consequently, a direct adoption to train LLMs with such strategy can result in prohibitively high training costs as the length of interactions grows. To alleviate the computational inefficiency, we propose a novel training paradigm, namely Dynamic Target Isolation (DTI), that structurally parallelizes the training of $k$ (where $k>>1$) target interactions. Furthermore, we identify two major bottlenecks - hidden-state leakage and positional bias overfitting - that limit DTI to only scale up to a small value of $k$ (e.g., 5) then propose a computationally light solution to effectively tackle each. Through extensive experiments on three widely adopted public CTR datasets, we empirically show that DTI reduces training time by an average of $ extbf{92%}$ (e.g., from $70.5$ hrs to $5.31$ hrs), without compromising CTR prediction performance.