UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current conversational recommendation systems (CRS) suffer from a lack of high-quality, reproducible simulation-based evaluation resources. To address this, we propose the first open-source simulation evaluation platform specifically designed for CRS. Our method introduces three core innovations: (1) an enhanced agenda-driven user simulator that leverages large language models (LLMs) to generate realistic user intents and behaviors; (2) an LLM-as-a-judge automated evaluation framework supporting fine-grained, multi-dimensional assessment—including recommendation accuracy, dialogue coherence, and task completion rate; and (3) a modular architecture compatible with mainstream CRS models and diverse datasets. Extensive experiments demonstrate that our platform significantly improves simulation fidelity and evaluation efficiency. It outperforms baseline approaches in flexibility, extensibility, and evaluation consistency, establishing a standardized, reproducible benchmarking infrastructure for CRS research.

Technology Category

Application Category

📝 Abstract
Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.
Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of simulation-based evaluation resources for conversational recommender systems
Upgrades toolkit to align with state-of-the-art research in CRS evaluation
Introduces new simulators and utilities for broader CRS and dataset integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced agenda-based user simulator
Large language model-based simulators introduced
Integration for wider CRSs and datasets
🔎 Similar Papers
No similar papers found.