UserSimCRS v2: Simulation-Based Evaluation for Conversational Recommender Systems

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Current conversational recommendation systems (CRS) suffer from a lack of high-quality, reproducible simulation-based evaluation resources. To address this, we propose the first open-source simulation evaluation platform specifically designed for CRS. Our method introduces three core innovations: (1) an enhanced agenda-driven user simulator that leverages large language models (LLMs) to generate realistic user intents and behaviors; (2) an LLM-as-a-judge automated evaluation framework supporting fine-grained, multi-dimensional assessment—including recommendation accuracy, dialogue coherence, and task completion rate; and (3) a modular architecture compatible with mainstream CRS models and diverse datasets. Extensive experiments demonstrate that our platform significantly improves simulation fidelity and evaluation efficiency. It outperforms baseline approaches in flexibility, extensibility, and evaluation consistency, establishing a standardized, reproducible benchmarking infrastructure for CRS research.

Technology Category

Application Category

📝 Abstract

Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of simulation-based evaluation resources for conversational recommender systems

Upgrades toolkit to align with state-of-the-art research in CRS evaluation

Introduces new simulators and utilities for broader CRS and dataset integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhanced agenda-based user simulator

Large language model-based simulators introduced

Integration for wider CRSs and datasets

🔎 Similar Papers

No similar papers found.