🤖 AI Summary
This work addresses the challenge of effectively indexing and retrieving free-form factual corrections in complex B2B settings, where such user-provided feedback is often difficult to integrate into retrieval-augmented generation (RAG) systems. To tackle this, the authors propose Iterative Nugget Optimization (INO), a method that transforms user feedback into compact “fact nuggets” during indexing and enhances their discoverability through an iterative loop of query testing, failure analysis, and automated revision. INO represents the first production-grade RAG framework to implement a closed-loop, feedback-driven approach for automated knowledge refinement. By integrating techniques such as retrieval-augmented generation, query rewriting, and failure tracing, INO significantly improves both retrieval rates and reuse effectiveness of factual corrections in two real-world B2B knowledge agent systems, outperforming baseline methods in both automated metrics and human evaluations.
📝 Abstract
Agentic retrieval-augmented generation (RAG) systems in complex B2B (business-to-business) settings may often receive free-form response feedback. Rather than generic feedback signals such as style, preference, or overall response quality, we focus on actionable factual corrections. We identify these instances and convert them into compact knowledge-base entries, which we call factual nuggets. We introduce Iterative Nugget Optimization (INO), an index-time optimization method that uses the production agentic RAG as a test harness: it creates an initial nugget, probes it with the triggering query and paraphrases, reflects over failed retrieval and answer traces, and revises the nugget until it is discoverable. We evaluate INO with two production B2B knowledge-assistance agents across multiple companies that use our system: a product support agent that answers questions over company-specific knowledge bases, and a support ticket agent that assists support engineers. INO consistently improves results over baselines in terms of discoverability and usage of factual corrections, in automated and human evaluations.