Evaluating Recabilities of Foundation Models: A Multi-Domain, Multi-Dataset Benchmark

📅 2025-08-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing recommendation foundation models lack systematic cross-domain and cross-dataset evaluation, hindering fair benchmarking and capability analysis. Method: We introduce RecBench-MD, a multi-domain, multi-dataset benchmark covering 10 domains and 15 datasets, enabling the first unified evaluation of 19 state-of-the-art foundation models. We propose a zero-resource evaluation framework integrating three paradigms: zero-shot inference, cross-dataset transfer learning, and multi-domain joint training. Contribution/Results: Large-scale empirical analysis reveals that in-domain fine-tuning achieves optimal performance; cross-dataset transfer exhibits strong generalization; and multi-domain joint training significantly improves domain adaptability. All code, datasets, and evaluation results are fully open-sourced. This work establishes a reproducible, extensible infrastructure for standardized evaluation, capability attribution, and future advancement of recommendation foundation models.

Technology Category

Application Category

📝 Abstract
Comprehensive evaluation of the recommendation capabilities of existing foundation models across diverse datasets and domains is essential for advancing the development of recommendation foundation models. In this study, we introduce RecBench-MD, a novel and comprehensive benchmark designed to assess the recommendation abilities of foundation models from a zero-resource, multi-dataset, and multi-domain perspective. Through extensive evaluations of 19 foundation models across 15 datasets spanning 10 diverse domains -- including e-commerce, entertainment, and social media -- we identify key characteristics of these models in recommendation tasks. Our findings suggest that in-domain fine-tuning achieves optimal performance, while cross-dataset transfer learning provides effective practical support for new recommendation scenarios. Additionally, we observe that multi-domain training significantly enhances the adaptability of foundation models. All code and data have been publicly released to facilitate future research.
Problem

Research questions and friction points this paper is trying to address.

Evaluating foundation models' recommendation capabilities across domains
Assessing zero-resource performance in multi-dataset scenarios
Analyzing cross-domain transfer learning for recommendation systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-dataset multi-domain benchmark evaluation
Zero-resource cross-dataset transfer learning
In-domain fine-tuning for optimal performance
🔎 Similar Papers
No similar papers found.
Q
Qijiong Liu
The Hong Kong Polytechnic University
J
Jieming Zhu
Huawei Noah’s Ark Lab, Shenzhen, China
Y
Yingxin Lai
Xiamen University, Xiamen, China
X
Xiaoyu Dong
The Hong Kong Polytechnic University
Lu Fan
Lu Fan
PolyU
graph mininglow-resource language understanding
Z
Zhipeng Bian
Shenzhen University, Shenzhen, China
Zhenhua Dong
Zhenhua Dong
Noah's ark lab, Huawei Technologies Co., Ltd.
Recommender systemcausal inferencecountrfactual learningtrustworthy AImachine learning
X
Xiao-Ming Wu
The Hong Kong Polytechnic University