Mitigating Strategy Preference Bias in Emotional Support Conversation via Uncertainty Estimations

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

Large language models (LLMs) exhibit low strategy planning accuracy and pronounced strategy preference bias in empathetic support conversations (ESCs), yet the underlying causes remain unclear. This paper identifies, for the first time, that LLMs’ strategy planning capability is fundamentally constrained by implicit knowledge boundaries—and reveals a strong correlation between these boundaries and preference bias. To address this, we propose an uncertainty-aware dual-reward reinforcement learning framework that jointly optimizes strategy accuracy and entropy-based confidence. Our approach integrates fine-tuning with collaborative inference across multiple backbone LLMs to calibrate strategy selection. Experiments on ESCov and ExTES demonstrate significant improvements: +12.3% in strategy accuracy and −38.7% in bias measurement. This work establishes a novel paradigm for building trustworthy, bias-mitigated empathetic support dialogue systems grounded in explicit uncertainty modeling and knowledge-boundary awareness.

Technology Category

Application Category

📝 Abstract

Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Mitigating strategy preference bias in emotional support conversations

Addressing low accuracy in strategy planning for LLMs

Reducing bias via uncertainty estimations and reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with dual reward function

Optimizing strategy planning via accuracy and entropy

Identifying knowledge boundaries to mitigate bias

🔎 Similar Papers

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation