Are Large Language Models Really Effective for Training-Free Cold-Start Recommendation?

๐Ÿ“… 2025-12-15
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses training-free cold-start recommendation (TFCSR), where new users have zero interaction history and no task-specific fine-tuning is permitted. We systematically compare, for the first time under a unified benchmark, two paradigms: large language model (LLM)-based re-ranking (using open-source LLMs including Llama and Qwen) versus text embedding model (TEM)-based retrieval (using state-of-the-art TEMs such as E5 and BGE). Experimental results demonstrate that TEMs consistently and significantly outperform LLM re-rankers across both cold-start and warm-start settings, while achieving 3โ€“10ร— faster inference and substantially lower computational overhead. Our findings challenge the prevailing assumption that LLMs inherently surpass traditional embedding methods in zero-shot recommendation. Instead, we establish that TEMs offer superior effectiveness, practicality, and scalability for TFCSRโ€”providing a more efficient and deployable technical pathway for cold-start recommendation.

Technology Category

Application Category

๐Ÿ“ Abstract
Recommender systems usually rely on large-scale interaction data to learn from users' past behaviors and make accurate predictions. However, real-world applications often face situations where no training data is available, such as when launching new services or handling entirely new users. In such cases, conventional approaches cannot be applied. This study focuses on training-free recommendation, where no task-specific training is performed, and particularly on extit{training-free cold-start recommendation} (TFCSR), the more challenging case where the target user has no interactions. Large language models (LLMs) have recently been explored as a promising solution, and numerous studies have been proposed. As the ability of text embedding models (TEMs) increases, they are increasingly recognized as applicable to training-free recommendation, but no prior work has directly compared LLMs and TEMs under identical conditions. We present the first controlled experiments that systematically evaluate these two approaches in the same setting. The results show that TEMs outperform LLM rerankers, and this trend holds not only in cold-start settings but also in warm-start settings with rich interactions. These findings indicate that direct LLM ranking is not the only viable option, contrary to the commonly shared belief, and TEM-based approaches provide a stronger and more scalable basis for training-free recommendation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs vs TEMs for training-free cold-start recommendation
Comparing LLM and TEM performance under identical experimental conditions
Assessing scalability of TEM-based approaches for cold-start scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text embedding models outperform LLM rerankers in cold-start
Controlled experiments compare LLMs and TEMs under identical conditions
TEMs provide scalable basis for training-free recommendation tasks
G
Genki Kusano
NEC Corporation
K
Kenya Abe
NEC Corporation
Kunihiro Takeoka
Kunihiro Takeoka
NEC Corporation
Machine LearningNatural Language Processing