🤖 AI Summary
This work investigates the impact of large language models (LLMs) as text encoders on text-based collaborative filtering (TCF) performance, systematically evaluating LLMs ranging from 100M to 100B parameters. Methodologically, it employs fine-tuning and zero-shot transfer, text embedding alignment, and cross-domain recommendation evaluation. Key contributions are threefold: (1) It is the first to reveal a non-monotonic performance curve for TCF under LLM scaling—exhibiting saturation and even performance reversal beyond certain scales; (2) It empirically validates the feasibility of “generic textual item representations,” challenging the conventional ID-dependent paradigm; (3) A 10B-parameter model achieves a 12.7% Recall@10 improvement on News and Amazon datasets, yet suffers degraded generalization, exposing diminishing returns in scale expansion. Collectively, this study provides critical empirical evidence and theoretical insights for developing transferable, “one-model-for-many-scenarios” recommendation frameworks.
📝 Abstract
Text-based collaborative filtering (TCF) has emerged as the prominent technique for text and news recommendation, employing language models (LMs) as text encoders to represent items. However, the current landscape of TCF models mainly relies on the utilization of relatively small or medium-sized LMs. The potential impact of using larger, more powerful language models (such as these with over 100 billion parameters) as item encoders on recommendation performance remains uncertain. Can we anticipate unprecedented results and discover new insights? To address this question, we undertake a comprehensive series of experiments aimed at exploring the performance limits of the TCF paradigm. Specifically, we progressively augment the scale of item encoders, ranging from one hundred million to one hundred billion parameters, in order to reveal the scaling limits of the TCF paradigm. Moreover, we investigate whether these exceptionally large LMs have the potential to establish a universal item representation for the recommendation task, thereby revolutionizing the traditional ID paradigm, which is considered a significant obstacle to developing transferable ''one model fits all'' recommender models. Our study not only demonstrates positive results but also uncovers unexpected negative outcomes, illuminating the current state of the TCF paradigm within the community. These findings will evoke deep reflection and inspire further research on text-based recommender systems.