Recommendation Is a Dish Better Served Warm

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing cold-start evaluation in recommender systems suffers from arbitrary boundary definitions and inconsistent thresholds, leading to unreliable and incomparable experimental results. Method: We systematically investigate how interaction thresholds—specifically item frequency during training and user history length during inference—affect evaluation outcomes across multiple benchmark datasets and models. Through quantitative analysis, we examine threshold-induced biases in data utilization, misclassification of cold instances, and degradation in evaluation accuracy. Contribution/Results: We propose a “dynamic cold-start boundary” paradigm that adaptively defines cold/hot users and items based on empirical data distribution and task-specific objectives, replacing rigid, fixed thresholds. Experiments demonstrate that this approach significantly improves evaluation consistency and cross-study comparability, prevents loss of valuable interactions, and mitigates label noise. Our framework establishes a more principled, reproducible, and scientifically grounded benchmark for cold-start research.

Technology Category

Application Category

📝 Abstract

In modern recommender systems, experimental settings typically include filtering out cold users and items based on a minimum interaction threshold. However, these thresholds are often chosen arbitrarily and vary widely across studies, leading to inconsistencies that can significantly affect the comparability and reliability of evaluation results. In this paper, we systematically explore the cold-start boundary by examining the criteria used to determine whether a user or an item should be considered cold. Our experiments incrementally vary the number of interactions for different items during training, and gradually update the length of user interaction histories during inference. We investigate the thresholds across several widely used datasets, commonly represented in recent papers from top-tier conferences, and on multiple established recommender baselines. Our findings show that inconsistent selection of cold-start thresholds can either result in the unnecessary removal of valuable data or lead to the misclassification of cold instances as warm, introducing more noise into the system.

Problem

Research questions and friction points this paper is trying to address.

Examining arbitrary cold-start thresholds in recommender systems

Analyzing impact of inconsistent thresholds on evaluation reliability

Investigating data loss and noise from misclassified cold instances

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically explore cold-start boundary criteria

Incrementally vary interaction thresholds during training

Investigate thresholds across multiple datasets and baselines

🔎 Similar Papers

No similar papers found.