🤖 AI Summary
Retrieval-augmented generation (RAG)-based intelligent customer service systems suffer from frequent hallucinations and rigid responses, degrading user experience and increasing business risk. Method: We propose a human-like two-stage learning framework: (1) expert reasoning modeling to supervise imitation of human inference processes; and (2) cold-start fine-tuning combined with progressive reinforcement learning for difficulty-structured self-refinement—from simple to complex cases. Our approach integrates RAG, supervised fine-tuning (SFT), and reinforcement learning (RL) to jointly optimize factual consistency and linguistic naturalness. Contribution/Results: In large-scale online A/B tests, the framework improved intelligent resolution rates by 28.92% in community support and 18.42% in live-streaming interaction scenarios, while reducing human takeover rates by 6.08% and 7.12%, respectively—significantly enhancing system safety and human-likeness.
📝 Abstract
Intelligent customer service (ICS) systems via retrieval-augmented generation (RAG) have been widely adopted in Web-based domains such as social platforms and e-commerce, achieving remarkable improvements in automation and efficiency. However, notable limitations still remain: these systems are prone to hallucinations and often generate rigid, mechanical responses, which can introduce business risks and undermine user experience, especially in Web-based customer service interactions under the RAG scenarios. In this paper, we introduce OlaMind, a human-like and hallucination-safe customer service framework for retrieval-augmented dialogue. Specifically, it first leverages a Learn-to-Think stage to learn the reasoning processes and response strategies from human experts, and then employs a Learn-to-Respond stage to perform cold-start supervised fine-tuning (SFT) combined with reinforcement learning (RL) for basic-to-hard self-refinement. Our method significantly enhances human-likeness and naturalness while effectively mitigating hallucinations and critical business risks. We have conducted large-scale online A/B experiments in an industry-level social customer service setting, and extensive experimental results show that OlaMind achieves significant cumulative relative improvements with intelligent resolution rates +28.92%/+18.42% and human takeover rate -6.08%/-7.12% in community-support/livestream-interaction scenarios, respectively, which highlights its consistent effectiveness across diverse real-world applications. The code and data will be publicly available.