Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limited behavioral diversity and controllability of existing seeker simulators in evaluating emotional support chatbots, which hampers their ability to faithfully replicate real-world user behaviors. To overcome this, we propose the first controllable seeker simulator that integrates nine psychological and linguistic traits within a Mixture-of-Experts (MoE) architecture, trained on authentic Reddit conversation data. Our approach enables fine-grained modeling and precise control over diverse help-seeking behaviors. Experimental results demonstrate that the proposed simulator significantly outperforms current alternatives in both behavioral diversity and consistency with user profiles. Furthermore, when used to evaluate seven state-of-the-art emotional support models, it effectively uncovers performance degradation under complex scenarios, thereby enhancing the realism and stress-testing capability of model evaluation.

Technology Category

Application Category

📝 Abstract

As emotional support chatbots have recently gained significant traction across both research and industry, a common evaluation strategy has emerged: use help-seeker simulators to interact with supporter chatbots. However, current simulators suffer from two critical limitations: (1) they fail to capture the behavioral diversity of real-world seekers, often portraying them as overly cooperative, and (2) they lack the controllability required to simulate specific seeker profiles. To address these challenges, we present a controllable seeker simulator driven by nine psychological and linguistic features that underpin seeker behavior. Using authentic Reddit conversations, we train our model via a Mixture-of-Experts (MoE) architecture, which effectively differentiates diverse seeker behaviors into specialized parameter subspaces, thereby enhancing fine-grained controllability. Our simulator achieves superior profile adherence and behavioral diversity compared to existing approaches. Furthermore, evaluating 7 prominent supporter models with our system uncovers previously obscured performance degradations. These findings underscore the utility of our framework in providing a more faithful and stress-tested evaluation for emotional support chatbots.

Problem

Research questions and friction points this paper is trying to address.

emotional support evaluation

seeker simulator

behavioral diversity

controllability

chatbot evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable simulator

behavioral diversity

Mixture-of-Experts

emotional support evaluation

seeker profiling

🔎 Similar Papers

No similar papers found.

Authors to Follow