🤖 AI Summary
This work identifies an unreported default configuration—specifically, RecBole’s implicit early stopping—in recommender system evaluation that covertly biases hyperparameter optimization: it prematurely terminates random and Bayesian search, severely restricting the effective search space and amplifying result variance to a degree comparable with algorithmic differences between optimizers. Moving beyond conventional model-centric auditing, this study pioneers a framework-behavior audit, systematically analyzing six recommendation models, two benchmark datasets, and multiple search strategies via execution tracing and variance quantification. Empirical results confirm that such defaults introduce substantial, invisible bias. The paper proposes concrete best practices—including explicit configuration logging, deterministic search-space bounding, and opt-in early stopping—to enhance framework transparency, reproducibility, and experimental rigor across the recommender systems toolchain.
📝 Abstract
Hyperparameter optimization is critical for improving the performance of recommender systems, yet its implementation is often treated as a neutral or secondary concern. In this work, we shift focus from model benchmarking to auditing the behavior of RecBole, a widely used recommendation framework. We show that RecBole's internal defaults, particularly an undocumented early-stopping policy, can prematurely terminate Random Search and Bayesian Optimization. This limits search coverage in ways that are not visible to users. Using six models and two datasets, we compare search strategies and quantify both performance variance and search path instability. Our findings reveal that hidden framework logic can introduce variability comparable to the differences between search strategies. These results highlight the importance of treating frameworks as active components of experimental design and call for more transparent, reproducibility-aware tooling in recommender systems research. We provide actionable recommendations for researchers and developers to mitigate hidden configuration behaviors and improve the transparency of hyperparameter tuning workflows.