π€ AI Summary
To address performance degradation, sparse human feedback, and challenges in continual learning for enterprise AI agents in Retrieval-Augmented Generation (RAG), this paper proposes a data flywheel system built upon the MAPE (Monitor-Analyze-Plan-Execute) control loopβmarking the first explicit integration of human feedback into a closed-loop RAG optimization framework, thereby endowing AI agents with self-evolution capability. Methodologically, it unifies a Mixture-of-Experts (MoE) architecture, NeMo-based microservices, fine-grained parameter-efficient fine-tuning, and a novel routing-query reformulation co-optimization mechanism. Evaluated over three months in a production environment, the system leveraged 495 real negative feedback instances to achieve: 96% routing accuracy; 10Γ model size reduction; 70% end-to-end latency reduction; 3.7% improvement in query reformulation accuracy; and 40% latency reduction in reformulation. This work establishes a practical, feedback-driven closed-loop paradigm for continual RAG evolution under low-feedback conditions.
π Abstract
Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25%) and query rephrasal errors (3.2%). Using NVIDIA NeMo microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96% accuracy, a 10x reduction in model size, and 70% latency improvement. For query rephrasal, fine-tuning yielded a 3.7% gain in accuracy and a 40% latency reduction. Our approach demonstrates how human-in-the-loop (HITL) feedback, when structured within a data flywheel, transforms enterprise AI agents into self-improving systems. Key learnings include approaches to ensure agent robustness despite limited user feedback, navigating privacy constraints, and executing staged rollouts in production. This work offers a repeatable blueprint for building robust, adaptive enterprise AI agents capable of learning from real-world usage at scale.