Towards a Design Guideline for RPA Evaluation: A Survey of Large Language Model-Based Role-Playing Agents

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Evaluating role-playing agents (RPAs) powered by large language models lacks standardized benchmarks, hindering cross-task and cross-design comparability. Method: Through a systematic literature review (2021–2024, 1,676 papers), augmented by qualitative coding and multidimensional attribute modeling, this study constructs the first evidence-driven, actionable RPA evaluation design framework. It identifies six agent attributes, seven task attributes, and seven evaluation metrics, establishing a standardized, three-dimensional taxonomy spanning “agent–task–evaluation” and a reusable evaluation paradigm. Contribution/Results: The framework delivers the first structured, broadly applicable, empirically grounded RPA evaluation guideline, significantly enhancing the systematicity, consistency, and comparability of RPA assessments across diverse architectures and application scenarios.

Technology Category

Application Category

📝 Abstract

Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.

Problem

Research questions and friction points this paper is trying to address.

Develops design guideline for RPA evaluation

Identifies key attributes for RPA assessment

Proposes systematic RPA evaluation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based RPA evaluation

Systematic literature review

Generalizable design guideline

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents