Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit low strategy planning accuracy and pronounced strategy preference bias in empathetic support conversations (ESCs), yet the underlying causes remain unclear. This paper identifies, for the first time, that LLMs’ strategy planning capability is fundamentally constrained by implicit knowledge boundaries—and reveals a strong correlation between these boundaries and preference bias. To address this, we propose an uncertainty-aware dual-reward reinforcement learning framework that jointly optimizes strategy accuracy and entropy-based confidence. Our approach integrates fine-tuning with collaborative inference across multiple backbone LLMs to calibrate strategy selection. Experiments on ESCov and ExTES demonstrate significant improvements: +12.3% in strategy accuracy and −38.7% in bias measurement. This work establishes a novel paradigm for building trustworthy, bias-mitigated empathetic support dialogue systems grounded in explicit uncertainty modeling and knowledge-boundary awareness.

Technology Category

Application Category

📝 Abstract
Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Mitigating strategy preference bias in emotional support conversations
Addressing low accuracy in strategy planning for LLMs
Reducing bias via uncertainty estimations and reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with dual reward function
Optimizing strategy planning via accuracy and entropy
Identifying knowledge boundaries to mitigate bias
Y
Yougen Zhou
Shanghai Institute of Artificial Intelligence for Education, East China Normal University, Shanghai, China
Q
Qin Chen
School of Computer Science and Technology, East China Normal University, Shanghai, China
Ningning Zhou
Ningning Zhou
East China Normal University
TraumaGriefAutism
J
Jie Zhou
School of Computer Science and Technology, East China Normal University, Shanghai, China
Xingjiao Wu
Xingjiao Wu
East China Normal University
Computer VisionCrowd CountingDocument Layout AnalysisHuman-in-the-loop
L
Liang He
Shanghai Institute of Artificial Intelligence for Education, School of Computer Science and Technology, East China Normal University, Shanghai, China