🤖 AI Summary
This work addresses the limitations of existing agent-based recommender systems, which rely solely on single-dimensional interaction rewards and thus struggle to evaluate critical intermediate capabilities such as instruction following and factual consistency. To bridge this gap, we propose RecRM-Bench—the first multidimensional reward modeling benchmark tailored for agent recommendation systems—encompassing four key dimensions: instruction adherence, factual consistency, query-item relevance, and fine-grained user behavior prediction. Leveraging a large-scale dataset of over one million structured samples, we develop a systematic approach for training multidimensional reward models and introduce an integrated framework for hybrid reward function composition. The benchmark is publicly released on Hugging Face to advance the field from outcome-oriented, single-reward paradigms toward capability-driven, multidimensional reward modeling in recommender systems.
📝 Abstract
The integration of Large Language Model (LLM) agents is transforming recommender systems from simple query-item matching towards deeply personalized and interactive recommendations. Reinforcement Learning (RL) provides an essential framework for the optimization of these agents in recommendation tasks. However, current methodologies remain limited by a reliance on single dimensional outcome-based rewards that focus exclusively on final user interactions, overlooking critical intermediate capabilities, such as instruction following and complex intent understanding. Despite the necessity for designing multi-dimensional reward, the field lacks a standardized benchmark to facilitate this development. To bridge this gap, we introduce RecRM-Bench, the largest and most comprehensive benchmark to date for agentic recommender systems. It comprises over 1 million structured entries across four core evaluation dimensions: instruction following, factual consistency, query-item relevance, and fine-grained user behavior prediction. By supporting comprehensive assessment from syntactic compliance to complex intent grounding and preference modeling, RecRM-Bench provides a foundational dataset for training sophisticated reward models. Furthermore, we propose a systematic framework for the construction of multi-dimensional reward models and the integration of a hybrid reward function, establishing a robust foundation for developing reliable and highly capable agentic recommender systems. The complete RecRM-Bench dataset is publicly available at https://huggingface.co/datasets/wwzeng/RecRM-Bench.