Revisiting Language Models in Neural News Recommender Systems

📅 2025-01-20

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

It remains unclear whether large language models (LLMs) inherently improve performance in neural news recommendation, particularly under resource constraints and cold-start conditions. Method: We systematically reproduce and extend comparative experiments across pre-trained language models (PLMs), small language models (SLMs), and LLMs within a unified neural recommendation framework on the MIND dataset, employing fine-grained fine-tuning and hyperparameter analysis. Contribution/Results: We find no monotonic positive correlation between language model scale and recommendation accuracy; while LLMs yield no statistically significant overall AUC improvement, they achieve up to a 3.2% gain for cold-start users—validating their content-driven advantage. LLMs reduce reliance on user interaction history and enhance semantic utilization of news text, but exhibit heightened hyperparameter sensitivity and substantially increased computational overhead. This work provides the first quantitative evidence of LLMs’ “non-universal gains” in news recommendation and quantifies their specific value in cold-start scenarios.

Technology Category

Application Category

📝 Abstract

Neural news recommender systems (RSs) have integrated language models (LMs) to encode news articles with rich textual information into representations, thereby improving the recommendation process. Most studies suggest that (i) news RSs achieve better performance with larger pre-trained language models (PLMs) than shallow language models (SLMs), and (ii) that large language models (LLMs) outperform PLMs. However, other studies indicate that PLMs sometimes lead to worse performance than SLMs. Thus, it remains unclear whether using larger LMs consistently improves the performance of news RSs. In this paper, we revisit, unify, and extend these comparisons of the effectiveness of LMs in news RSs using the real-world MIND dataset. We find that (i) larger LMs do not necessarily translate to better performance in news RSs, and (ii) they require stricter fine-tuning hyperparameter selection and greater computational resources to achieve optimal recommendation performance than smaller LMs. On the positive side, our experiments show that larger LMs lead to better recommendation performance for cold-start users: they alleviate dependency on extensive user interaction history and make recommendations more reliant on the news content.

Problem

Research questions and friction points this paper is trying to address.

Neural News Recommendation

Large-scale Language Models

Resource Consumption and New User Recommendation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural News Recommendation

Large-scale Language Models

User Cold-start Problem

🔎 Similar Papers

No similar papers found.