🤖 AI Summary
To address low user engagement in social-driven dialogue systems, this paper proposes an interactive alignment method that leverages real-time user feedback as a reward signal. Methodologically, it introduces (1) the first “intent-response-as-reward” interactive alignment paradigm; (2) interactive Monte Carlo Tree Search (i×MCTS), which simulates dialogue evolution to generate high-quality preference data; and (3) an end-to-end framework integrating a user simulator, i×MCTS, Direct Preference Optimization (DPO), and interactive fine-tuning. Evaluated on empathetic support and benevolent persuasion tasks, the approach achieves substantial improvements over state-of-the-art baselines: +28.6% in session duration, +34.1% in response engagement, and +22.3% in dialogue completion rate—demonstrating comprehensive gains in user retention and interaction quality.
📝 Abstract
Enhancing user engagement through interactions plays an essential role in socially-driven dialogues. While prior works have optimized models to reason over relevant knowledge or plan a dialogue act flow, the relationship between user engagement and knowledge or dialogue acts is subtle and does not guarantee user engagement in socially-driven dialogues. To this end, we enable interactive LLMs to learn user engagement by leveraging signals from the future development of conversations. Specifically, we adopt a more direct and relevant indicator of user engagement, i.e., the user's reaction related to dialogue intention after the interaction, as a reward to align interactive LLMs. To achieve this, we develop a user simulator to interact with target interactive LLMs and explore interactions between the user and the interactive LLM system via extit{i$ imes$MCTS} ( extit{M}onte extit{C}arlo extit{T}ree extit{S}earch for extit{i}nteraction). In this way, we collect a dataset containing pairs of higher and lower-quality experiences using extit{i$ imes$MCTS}, and align interactive LLMs for high-level user engagement by direct preference optimization (DPO) accordingly. Experiments conducted on two socially-driven dialogue scenarios (emotional support conversations and persuasion for good) demonstrate that our method effectively enhances user engagement in interactive LLMs.