Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enterprise-grade conversational AI systems face critical challenges including high latency, frequent hallucinations, and delayed domain knowledge updates. To address these, we propose a hybrid dialogue system that synergistically integrates retrieval-augmented generation (RAG) with intent-driven predefined responses. A novel dynamic routing mechanism adaptively dispatches queries based on semantic intent and confidence estimation. The system further incorporates dialogue state tracking, context-aware response generation, and closed-loop feedback learning to ensure multi-turn consistency and continuous improvement. It supports iterative intent evolution, online confidence-threshold tuning, and automatic expansion of response coverage. Evaluated in real-world enterprise deployments, our system achieves 95% accuracy and an average latency of 180 ms—significantly outperforming both pure RAG and pure intent-based baselines. The architecture demonstrates robustness, low-latency responsiveness, and scalable extensibility.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems and large language model (LLM)-powered chatbots have significantly advanced conversational AI by combining generative capabilities with external knowledge retrieval. Despite their success, enterprise-scale deployments face critical challenges, including diverse user queries, high latency, hallucinations, and difficulty integrating frequently updated domain-specific knowledge. This paper introduces a novel hybrid framework that integrates RAG with intent-based canned responses, leveraging predefined high-confidence responses for efficiency while dynamically routing complex or ambiguous queries to the RAG pipeline. Our framework employs a dialogue context manager to ensure coherence in multi-turn interactions and incorporates a feedback loop to refine intents, dynamically adjust confidence thresholds, and expand response coverage over time. Experimental results demonstrate that the proposed framework achieves a balance of high accuracy (95%) and low latency (180ms), outperforming RAG and intent-based systems across diverse query types, positioning it as a scalable and adaptive solution for enterprise conversational AI applications.
Problem

Research questions and friction points this paper is trying to address.

Addressing high latency and hallucinations in enterprise conversational AI
Integrating frequently updated domain-specific knowledge into AI responses
Balancing accuracy and efficiency in multi-turn online conversations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid framework combining RAG and intent-based responses
Dynamic routing for complex or ambiguous queries
Feedback loop refining intents and adjusting thresholds
🔎 Similar Papers
No similar papers found.