Proactive Guidance of Multi-Turn Conversation in Industrial Search

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the challenge in industrial search of balancing dynamic goal drift and low-latency requirements in multi-turn dialogues, this paper proposes a two-stage active guidance framework. In Stage I, goal-adaptive supervised fine-tuning (SFT) dynamically tracks evolving user intent. In Stage II, click-oriented reinforcement learning (RL) automatically constructs preference pairs from implicit user click signals, decoupling objective identification optimization from click-through rate (CTR) improvement. The framework innovatively integrates knowledge distillation, a generative–ranking paradigm, and lightweight deployment strategies. Experiments demonstrate significant improvements: offline goal identification accuracy reaches 86.10% (+23.95 percentage points), online CTR increases by 149.06% to 25.28%, and inference latency drops by 69.55%. These results validate the framework’s effectiveness in achieving high accuracy, strong engagement, and real-time responsiveness for industrial conversational search systems.

Technology Category

Application Category

📝 Abstract

The evolution of Large Language Models (LLMs) has significantly advanced multi-turn conversation systems, emphasizing the need for proactive guidance to enhance users' interactions. However, these systems face challenges in dynamically adapting to shifts in users' goals and maintaining low latency for real-time interactions. In the Baidu Search AI assistant, an industrial-scale multi-turn search system, we propose a novel two-phase framework to provide proactive guidance. The first phase, Goal-adaptive Supervised Fine-Tuning (G-SFT), employs a goal adaptation agent that dynamically adapts to user goal shifts and provides goal-relevant contextual information. G-SFT also incorporates scalable knowledge transfer to distill insights from LLMs into a lightweight model for real-time interaction. The second phase, Click-oriented Reinforcement Learning (C-RL), adopts a generate-rank paradigm, systematically constructs preference pairs from user click signals, and proactively improves click-through rates through more engaging guidance. This dual-phase architecture achieves complementary objectives: G-SFT ensures accurate goal tracking, while C-RL optimizes interaction quality through click signal-driven reinforcement learning. Extensive experiments demonstrate that our framework achieves 86.10% accuracy in offline evaluation (+23.95% over baseline) and 25.28% CTR in online deployment (149.06% relative improvement), while reducing inference latency by 69.55% through scalable knowledge distillation.

Problem

Research questions and friction points this paper is trying to address.

Enhancing proactive guidance in multi-turn industrial search conversations

Adapting dynamically to user goal shifts with low latency

Improving click-through rates via engaging reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Goal-adaptive Supervised Fine-Tuning for dynamic adaptation

Click-oriented Reinforcement Learning for engagement

Scalable knowledge distillation for low latency

🔎 Similar Papers

No similar papers found.