π€ AI Summary
This work addresses the challenge of efficiently capturing and generalizing usersβ implicit and dynamically evolving needs in multi-turn emotional support dialogues. To this end, the authors propose a gradient-free active dialogue learning framework that innovatively integrates a theory-of-mind mechanism to explicitly model uncertainty in user needs. Leveraging this representation, an active learning strategy is devised to prioritize system responses that are expected to elicit high-information feedback from users. This approach enables efficient exploration during training and enhances response robustness at inference time. Experimental results demonstrate that the proposed framework consistently outperforms strong baselines across multiple dialogue benchmarks and model architectures, achieving significant improvements in both dialogue quality and user alignment.
π Abstract
Emotional support plays an important role in dialogue systems, and its success depends on adapting to a user's evolving and implicit needs across multi-turn interactions while leveraging the strong reasoning capacity of large language models. However, since signals about user needs are often weak, indirect, and can only be disambiguated through multi-turn interaction, existing emotional support methods often struggle to acquire and generalize relevant conversational knowledge efficiently. To bridge this gap, we introduce User-Aware Active Knowledge Acquisition (UKA), a gradient-free active dialogue learning framework that explicitly represents uncertainty about user needs and incorporates active learning into both knowledge acquisition and response selection.We propose a Theory-of-Mind uncertainty estimation mechanism that allows the model to prioritize responses, thereby eliciting more informative user feedback. UKA is capable of efficiently exploring user-aligned conversational knowledge during training while maintaining robustness at test time. Experiments across multiple dialogue benchmarks and model architectures demonstrate that our approach consistently outperforms strong baselines in dialogue quality and user alignment.