🤖 AI Summary
This work addresses the challenges of ambiguous citation provenance and content redundancy commonly encountered in existing retrieval-augmented generation (RAG) systems during information integration. The authors propose a knowledge base construction approach grounded in Q&A nuggets, which leverages explicit question-answer semantics to guide information extraction, selection, and generation while preserving source attribution throughout the pipeline. Departing from conventional fuzzy clustering abstractions, the method employs interpretable Q&A fragments as structured intermediate representations, enabling end-to-end traceable reasoning and generation. Experimental results on the TREC NeuCLIR 2024 dataset demonstrate that the proposed approach significantly outperforms the state-of-the-art nugget-based RAG system, Ginger, in terms of nugget recall, density, and citation accuracy.
📝 Abstract
RAGE systems integrate ideas from automatic evaluation (E) into Retrieval-augmented Generation (RAG). As one such example, we present Crucible, a Nugget-Augmented Generation System that preserves explicit citation provenance by constructing a bank of Q&A nuggets from retrieved documents and uses them to guide extraction, selection, and report generation. Reasoning on nuggets avoids repeated information through clear and interpretable Q&A semantics - instead of opaque cluster abstractions - while maintaining citation provenance throughout the entire generation process. Evaluated on the TREC NeuCLIR 2024 collection, our Crucible system substantially outperforms Ginger, a recent nugget-based RAG system, in nugget recall, density, and citation grounding.