🤖 AI Summary
Enterprise-grade conversational AI systems face critical challenges including high latency, frequent hallucinations, and delayed domain knowledge updates. To address these, we propose a hybrid dialogue system that synergistically integrates retrieval-augmented generation (RAG) with intent-driven predefined responses. A novel dynamic routing mechanism adaptively dispatches queries based on semantic intent and confidence estimation. The system further incorporates dialogue state tracking, context-aware response generation, and closed-loop feedback learning to ensure multi-turn consistency and continuous improvement. It supports iterative intent evolution, online confidence-threshold tuning, and automatic expansion of response coverage. Evaluated in real-world enterprise deployments, our system achieves 95% accuracy and an average latency of 180 ms—significantly outperforming both pure RAG and pure intent-based baselines. The architecture demonstrates robustness, low-latency responsiveness, and scalable extensibility.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated domain-specific knowledge. This paper introduces a novel hybrid framework that integrates RAG with intent-based canned responses, leveraging predefined high-confidence responses for efficiency while dynamically routing complex or ambiguous queries to the RAG pipeline. Our framework employs a dialogue context manager to ensure coherence in multi-turn interactions and incorporates a feedback loop to refine intents, dynamically adjust confidence thresholds, and expand response coverage over time. Experimental results demonstrate that the proposed framework achieves a balance of high accuracy (95%) and low latency (180ms), outperforming RAG and intent-based systems across diverse query types, positioning it as a scalable and adaptive solution for enterprise conversational AI applications.