OlaMind: Towards Human-Like and Hallucination-Safe Customer Service for Retrieval-Augmented Dialogue

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Retrieval-augmented generation (RAG)-based intelligent customer service systems suffer from frequent hallucinations and rigid responses, degrading user experience and increasing business risk. Method: We propose a human-like two-stage learning framework: (1) expert reasoning modeling to supervise imitation of human inference processes; and (2) cold-start fine-tuning combined with progressive reinforcement learning for difficulty-structured self-refinement—from simple to complex cases. Our approach integrates RAG, supervised fine-tuning (SFT), and reinforcement learning (RL) to jointly optimize factual consistency and linguistic naturalness. Contribution/Results: In large-scale online A/B tests, the framework improved intelligent resolution rates by 28.92% in community support and 18.42% in live-streaming interaction scenarios, while reducing human takeover rates by 6.08% and 7.12%, respectively—significantly enhancing system safety and human-likeness.

Technology Category

Application Category

📝 Abstract
Intelligent customer service (ICS) systems via retrieval-augmented generation (RAG) have been widely adopted in Web-based domains such as social platforms and e-commerce, achieving remarkable improvements in automation and efficiency. However, notable limitations still remain: these systems are prone to hallucinations and often generate rigid, mechanical responses, which can introduce business risks and undermine user experience, especially in Web-based customer service interactions under the RAG scenarios. In this paper, we introduce OlaMind, a human-like and hallucination-safe customer service framework for retrieval-augmented dialogue. Specifically, it first leverages a Learn-to-Think stage to learn the reasoning processes and response strategies from human experts, and then employs a Learn-to-Respond stage to perform cold-start supervised fine-tuning (SFT) combined with reinforcement learning (RL) for basic-to-hard self-refinement. Our method significantly enhances human-likeness and naturalness while effectively mitigating hallucinations and critical business risks. We have conducted large-scale online A/B experiments in an industry-level social customer service setting, and extensive experimental results show that OlaMind achieves significant cumulative relative improvements with intelligent resolution rates +28.92%/+18.42% and human takeover rate -6.08%/-7.12% in community-support/livestream-interaction scenarios, respectively, which highlights its consistent effectiveness across diverse real-world applications. The code and data will be publicly available.
Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucinations in retrieval-augmented customer service systems
Reducing rigid mechanical responses in web-based dialogue interactions
Addressing business risks from poor automated customer service quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learn-to-Think stage learns expert reasoning processes
Cold-start SFT with RL enables self-refinement
Framework enhances human-likeness while reducing hallucinations
🔎 Similar Papers
No similar papers found.
T
Tianhong Gao
ByteDance, Beijing, China
J
Jundong Shen
ByteDance, Beijing, China
Bei Shi
Bei Shi
ByteDance, Tencent AI Lab, CUHK
text miningmachine learningnatural language processingreinforcement learning
Jiapeng Wang
Jiapeng Wang
South China University of Technology
document understandingvisual information extractionmulti-modal learningCLIPLLM
Ying Ju
Ying Ju
Xidian University
J
Junfeng Yao
ByteDance, Beijing, China
J
Jiao Ran
ByteDance, Beijing, China
Y
Yong Zhang
ByteDance, Beijing, China
Lin Dong
Lin Dong
Zhengzhou University
SpectroscopyPiezotronics & PiezophotonicsNanomaterials & Devices
H
Huiyu Yu
ByteDance, Beijing, China
T
Tingting Ye
ByteDance, Beijing, China