Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses training-free cold-start recommendation (TFCSR), where new users have zero interaction history and no task-specific fine-tuning is permitted. We systematically compare, for the first time under a unified benchmark, two paradigms: large language model (LLM)-based re-ranking (using open-source LLMs including Llama and Qwen) versus text embedding model (TEM)-based retrieval (using state-of-the-art TEMs such as E5 and BGE). Experimental results demonstrate that TEMs consistently and significantly outperform LLM re-rankers across both cold-start and warm-start settings, while achieving 3–10× faster inference and substantially lower computational overhead. Our findings challenge the prevailing assumption that LLMs inherently surpass traditional embedding methods in zero-shot recommendation. Instead, we establish that TEMs offer superior effectiveness, practicality, and scalability for TFCSR—providing a more efficient and deployable technical pathway for cold-start recommendation.

Technology Category

Application Category

📝 Abstract

Recommender systems usually rely on large-scale interaction data to learn from users' past behaviors and make accurate predictions. However, real-world applications often face situations where no training data is available, such as when launching new services or handling entirely new users. In such cases, conventional approaches cannot be applied. This study focuses on training-free recommendation, where no task-specific training is performed, and particularly on extit{training-free cold-start recommendation} (TFCSR), the more challenging case where the target user has no interactions. Large language models (LLMs) have recently been explored as a promising solution, and numerous studies have been proposed. As the ability of text embedding models (TEMs) increases, they are increasingly recognized as applicable to training-free recommendation, but no prior work has directly compared LLMs and TEMs under identical conditions. We present the first controlled experiments that systematically evaluate these two approaches in the same setting. The results show that TEMs outperform LLM rerankers, and this trend holds not only in cold-start settings but also in warm-start settings with rich interactions. These findings indicate that direct LLM ranking is not the only viable option, contrary to the commonly shared belief, and TEM-based approaches provide a stronger and more scalable basis for training-free recommendation.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs vs TEMs for training-free cold-start recommendation

Comparing LLM and TEM performance under identical experimental conditions

Assessing scalability of TEM-based approaches for cold-start scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Text embedding models outperform LLM rerankers in cold-start

Controlled experiments compare LLMs and TEMs under identical conditions

TEMs provide scalable basis for training-free recommendation tasks

🔎 Similar Papers

Keyword-driven Retrieval-Augmented Large Language Models for Cold-start User Recommendations