Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited behavioral diversity and controllability of existing seeker simulators in evaluating emotional support chatbots, which hampers their ability to faithfully replicate real-world user behaviors. To overcome this, we propose the first controllable seeker simulator that integrates nine psychological and linguistic traits within a Mixture-of-Experts (MoE) architecture, trained on authentic Reddit conversation data. Our approach enables fine-grained modeling and precise control over diverse help-seeking behaviors. Experimental results demonstrate that the proposed simulator significantly outperforms current alternatives in both behavioral diversity and consistency with user profiles. Furthermore, when used to evaluate seven state-of-the-art emotional support models, it effectively uncovers performance degradation under complex scenarios, thereby enhancing the realism and stress-testing capability of model evaluation.

Technology Category

Application Category

📝 Abstract
As emotional support chatbots have recently gained significant traction across both research and industry, a common evaluation strategy has emerged: use help-seeker simulators to interact with supporter chatbots. However, current simulators suffer from two critical limitations: (1) they fail to capture the behavioral diversity of real-world seekers, often portraying them as overly cooperative, and (2) they lack the controllability required to simulate specific seeker profiles. To address these challenges, we present a controllable seeker simulator driven by nine psychological and linguistic features that underpin seeker behavior. Using authentic Reddit conversations, we train our model via a Mixture-of-Experts (MoE) architecture, which effectively differentiates diverse seeker behaviors into specialized parameter subspaces, thereby enhancing fine-grained controllability. Our simulator achieves superior profile adherence and behavioral diversity compared to existing approaches. Furthermore, evaluating 7 prominent supporter models with our system uncovers previously obscured performance degradations. These findings underscore the utility of our framework in providing a more faithful and stress-tested evaluation for emotional support chatbots.
Problem

Research questions and friction points this paper is trying to address.

emotional support evaluation
seeker simulator
behavioral diversity
controllability
chatbot evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable simulator
behavioral diversity
Mixture-of-Experts
emotional support evaluation
seeker profiling
🔎 Similar Papers
No similar papers found.
C
Chaewon Heo
Graduate School of Data Science, Seoul National University
C
Cheyon Jin
Graduate School of Data Science, Seoul National University
Yohan Jo
Yohan Jo
Seoul National University
Natural Language ProcessingAgentsComputational PsychologyReasoning