Large Language Model Empowered Recommendation Meets All-domain Continual Pre-Training

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the semantic-behavior misalignment between general-purpose LLM representations and domain-specific user preferences, as well as poor cross-domain adaptability in multi-domain recommendation, this paper proposes CPRec—a full-domain continual pretraining framework. Methodologically: (1) a unified prompt template is designed to model heterogeneous domain user behavior sequences, explicitly capturing real-world decision logic; (2) a recommendation-oriented continual pretraining paradigm is introduced for the first time, circumventing domain-specific fine-tuning biases; (3) a Warmup-Stable-Annealing learning rate schedule is adopted to progressively inject behavioral knowledge. Evaluated on seven public domains and five real-world scenario datasets, CPRec significantly narrows the semantic-behavior gap and consistently outperforms state-of-the-art methods across all recommendation tasks.

Technology Category

Application Category

📝 Abstract

Recent research efforts have investigated how to integrate Large Language Models (LLMs) into recommendation, capitalizing on their semantic comprehension and open-world knowledge for user behavior understanding. These approaches predominantly employ supervised fine-tuning on single-domain user interactions to adapt LLMs for specific recommendation tasks. However, they typically encounter dual challenges: the mismatch between general language representations and domain-specific preference patterns, as well as the limited adaptability to multi-domain recommendation scenarios. To bridge these gaps, we introduce CPRec -- an All-domain Continual Pre-Training framework for Recommendation -- designed to holistically align LLMs with universal user behaviors through the continual pre-training paradigm. Specifically, we first design a unified prompt template and organize users' multi-domain behaviors into domain-specific behavioral sequences and all-domain mixed behavioral sequences that emulate real-world user decision logic. To optimize behavioral knowledge infusion, we devise a Warmup-Stable-Annealing learning rate schedule tailored for the continual pre-training paradigm in recommendation to progressively enhance the LLM's capability in knowledge adaptation from open-world knowledge to universal recommendation tasks. To evaluate the effectiveness of our CPRec, we implement it on a large-scale dataset covering seven domains and conduct extensive experiments on five real-world datasets from two distinct platforms. Experimental results confirm that our continual pre-training paradigm significantly mitigates the semantic-behavioral discrepancy and achieves state-of-the-art performance in all recommendation scenarios. The source code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Mismatch between general language representations and domain-specific preference patterns

Limited adaptability to multi-domain recommendation scenarios

Semantic-behavioral discrepancy in recommendation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

All-domain continual pre-training for recommendation

Unified prompt template for multi-domain behaviors

Warmup-Stable-Annealing learning rate schedule

🔎 Similar Papers

No similar papers found.

Authors to Follow