Evaluating Recabilities of Foundation Models: A Multi-Domain, Multi-Dataset Benchmark

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing recommendation foundation models lack systematic cross-domain and cross-dataset evaluation, hindering fair benchmarking and capability analysis. Method: We introduce RecBench-MD, a multi-domain, multi-dataset benchmark covering 10 domains and 15 datasets, enabling the first unified evaluation of 19 state-of-the-art foundation models. We propose a zero-resource evaluation framework integrating three paradigms: zero-shot inference, cross-dataset transfer learning, and multi-domain joint training. Contribution/Results: Large-scale empirical analysis reveals that in-domain fine-tuning achieves optimal performance; cross-dataset transfer exhibits strong generalization; and multi-domain joint training significantly improves domain adaptability. All code, datasets, and evaluation results are fully open-sourced. This work establishes a reproducible, extensible infrastructure for standardized evaluation, capability attribution, and future advancement of recommendation foundation models.

Technology Category

Application Category

📝 Abstract

Comprehensive evaluation of the recommendation capabilities of existing foundation models across diverse datasets and domains is essential for advancing the development of recommendation foundation models. In this study, we introduce RecBench-MD, a novel and comprehensive benchmark designed to assess the recommendation abilities of foundation models from a zero-resource, multi-dataset, and multi-domain perspective. Through extensive evaluations of 19 foundation models across 15 datasets spanning 10 diverse domains -- including e-commerce, entertainment, and social media -- we identify key characteristics of these models in recommendation tasks. Our findings suggest that in-domain fine-tuning achieves optimal performance, while cross-dataset transfer learning provides effective practical support for new recommendation scenarios. Additionally, we observe that multi-domain training significantly enhances the adaptability of foundation models. All code and data have been publicly released to facilitate future research.

Problem

Research questions and friction points this paper is trying to address.

Evaluating foundation models' recommendation capabilities across domains

Assessing zero-resource performance in multi-dataset scenarios

Analyzing cross-domain transfer learning for recommendation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-dataset multi-domain benchmark evaluation

Zero-resource cross-dataset transfer learning

In-domain fine-tuning for optimal performance

🔎 Similar Papers

No similar papers found.