Enhancing User Engagement in Socially-Driven Dialogue through Interactive LLM Alignments

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

To address low user engagement in social-driven dialogue systems, this paper proposes an interactive alignment method that leverages real-time user feedback as a reward signal. Methodologically, it introduces (1) the first “intent-response-as-reward” interactive alignment paradigm; (2) interactive Monte Carlo Tree Search (i×MCTS), which simulates dialogue evolution to generate high-quality preference data; and (3) an end-to-end framework integrating a user simulator, i×MCTS, Direct Preference Optimization (DPO), and interactive fine-tuning. Evaluated on empathetic support and benevolent persuasion tasks, the approach achieves substantial improvements over state-of-the-art baselines: +28.6% in session duration, +34.1% in response engagement, and +22.3% in dialogue completion rate—demonstrating comprehensive gains in user retention and interaction quality.

Technology Category

Application Category

📝 Abstract

Enhancing user engagement through interactions plays an essential role in socially-driven dialogues. While prior works have optimized models to reason over relevant knowledge or plan a dialogue act flow, the relationship between user engagement and knowledge or dialogue acts is subtle and does not guarantee user engagement in socially-driven dialogues. To this end, we enable interactive LLMs to learn user engagement by leveraging signals from the future development of conversations. Specifically, we adopt a more direct and relevant indicator of user engagement, i.e., the user's reaction related to dialogue intention after the interaction, as a reward to align interactive LLMs. To achieve this, we develop a user simulator to interact with target interactive LLMs and explore interactions between the user and the interactive LLM system via extit{i$ imes$MCTS} ( extit{M}onte extit{C}arlo extit{T}ree extit{S}earch for extit{i}nteraction). In this way, we collect a dataset containing pairs of higher and lower-quality experiences using extit{i$ imes$MCTS}, and align interactive LLMs for high-level user engagement by direct preference optimization (DPO) accordingly. Experiments conducted on two socially-driven dialogue scenarios (emotional support conversations and persuasion for good) demonstrate that our method effectively enhances user engagement in interactive LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing user engagement in socially-driven dialogues

Aligning LLMs using future conversation signals

Optimizing dialogue interactions via user reaction rewards

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive LLMs learn from future conversation signals

User simulator explores interactions via i×MCTS algorithm

Align LLMs using DPO with quality-ranked experience pairs

🔎 Similar Papers

No similar papers found.