🤖 AI Summary
Existing recommendation foundation models lack systematic cross-domain and cross-dataset evaluation, hindering fair benchmarking and capability analysis. Method: We introduce RecBench-MD, a multi-domain, multi-dataset benchmark covering 10 domains and 15 datasets, enabling the first unified evaluation of 19 state-of-the-art foundation models. We propose a zero-resource evaluation framework integrating three paradigms: zero-shot inference, cross-dataset transfer learning, and multi-domain joint training. Contribution/Results: Large-scale empirical analysis reveals that in-domain fine-tuning achieves optimal performance; cross-dataset transfer exhibits strong generalization; and multi-domain joint training significantly improves domain adaptability. All code, datasets, and evaluation results are fully open-sourced. This work establishes a reproducible, extensible infrastructure for standardized evaluation, capability attribution, and future advancement of recommendation foundation models.
📝 Abstract
Comprehensive evaluation of the recommendation capabilities of existing foundation models across diverse datasets and domains is essential for advancing the development of recommendation foundation models. In this study, we introduce RecBench-MD, a novel and comprehensive benchmark designed to assess the recommendation abilities of foundation models from a zero-resource, multi-dataset, and multi-domain perspective. Through extensive evaluations of 19 foundation models across 15 datasets spanning 10 diverse domains -- including e-commerce, entertainment, and social media -- we identify key characteristics of these models in recommendation tasks. Our findings suggest that in-domain fine-tuning achieves optimal performance, while cross-dataset transfer learning provides effective practical support for new recommendation scenarios. Additionally, we observe that multi-domain training significantly enhances the adaptability of foundation models. All code and data have been publicly released to facilitate future research.