🤖 AI Summary
Large language models (LLMs) exhibit low strategy planning accuracy and pronounced strategy preference bias in empathetic support conversations (ESCs), yet the underlying causes remain unclear. This paper identifies, for the first time, that LLMs’ strategy planning capability is fundamentally constrained by implicit knowledge boundaries—and reveals a strong correlation between these boundaries and preference bias. To address this, we propose an uncertainty-aware dual-reward reinforcement learning framework that jointly optimizes strategy accuracy and entropy-based confidence. Our approach integrates fine-tuning with collaborative inference across multiple backbone LLMs to calibrate strategy selection. Experiments on ESCov and ExTES demonstrate significant improvements: +12.3% in strategy accuracy and −38.7% in bias measurement. This work establishes a novel paradigm for building trustworthy, bias-mitigated empathetic support dialogue systems grounded in explicit uncertainty modeling and knowledge-boundary awareness.
📝 Abstract
Emotional support conversation (ESC) aims to alleviate distress through empathetic dialogue, yet large language models (LLMs) face persistent challenges in delivering effective ESC due to low accuracy in strategy planning. Moreover, there is a considerable preference bias towards specific strategies. Prior methods using fine-tuned strategy planners have shown potential in reducing such bias, while the underlying causes of the preference bias in LLMs have not well been studied. To address these issues, we first reveal the fundamental causes of the bias by identifying the knowledge boundaries of LLMs in strategy planning. Then, we propose an approach to mitigate the bias by reinforcement learning with a dual reward function, which optimizes strategy planning via both accuracy and entropy-based confidence for each region according to the knowledge boundaries. Experiments on the ESCov and ExTES datasets with multiple LLM backbones show that our approach outperforms the baselines, confirming the effectiveness of our approach.