LLMs vs. Chinese Anime Enthusiasts: A Comparative Study on Emotionally Supportive Role-Playing

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) capabilities in emotion-supportive role-playing (ESRP) for anime characters. Addressing the lack of joint modeling of character consistency and empathic responsiveness, we introduce ChatAnime—the first dedicated ESRP dataset—comprising 20 popular anime characters, 60 emotion-oriented scenarios, 24,000 LLM-generated responses, and 2,400 human fan responses, all annotated with 132,000 human labels. We propose the first multidimensional ESRP evaluation framework, featuring nine fine-grained metrics and diversity measures. Through dual-round comparative dialogues involving ten LLMs and forty experienced anime fans, we find that state-of-the-art models surpass humans in character consistency and empathy quality, whereas humans retain an advantage in response diversity. This work establishes a quantifiable, reproducible research paradigm for ESRP.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated impressive capabilities in role-playing conversations and providing emotional support as separate research directions. However, there remains a significant research gap in combining these capabilities to enable emotionally supportive interactions with virtual characters. To address this research gap, we focus on anime characters as a case study because of their well-defined personalities and large fan bases. This choice enables us to effectively evaluate how well LLMs can provide emotional support while maintaining specific character traits. We introduce ChatAnime, the first Emotionally Supportive Role-Playing (ESRP) dataset. We first thoughtfully select 20 top-tier characters from popular anime communities and design 60 emotion-centric real-world scenario questions. Then, we execute a nationwide selection process to identify 40 Chinese anime enthusiasts with profound knowledge of specific characters and extensive experience in role-playing. Next, we systematically collect two rounds of dialogue data from 10 LLMs and these 40 Chinese anime enthusiasts. To evaluate the ESRP performance of LLMs, we design a user experience-oriented evaluation system featuring 9 fine-grained metrics across three dimensions: basic dialogue, role-playing and emotional support, along with an overall metric for response diversity. In total, the dataset comprises 2,400 human-written and 24,000 LLM-generated answers, supported by over 132,000 human annotations. Experimental results show that top-performing LLMs surpass human fans in role-playing and emotional support, while humans still lead in response diversity. We hope this work can provide valuable resources and insights for future research on optimizing LLMs in ESRP. Our datasets are available at https://github.com/LanlanQiu/ChatAnime.

Problem

Research questions and friction points this paper is trying to address.

Combining role-playing and emotional support in LLMs

Evaluating LLMs' emotional support with anime characters

Comparing LLMs and humans in ESRP performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with anime role-playing for emotional support

Introduces ChatAnime dataset with human and LLM dialogues

Evaluates ESRP using 9 metrics across three dimensions

🔎 Similar Papers

No similar papers found.