🤖 AI Summary
Traditional RAG systems struggle with knowledge evolution, factual obsolescence, and source conflicts due to misalignment between retrieval and evaluation units. This work proposes NuggetIndex, which introduces atomic “nugget” information units equipped with lifecycle management, each embedding evidence links, temporal validity intervals, and state labels. By dynamically filtering obsolete content prior to retrieval through time- and state-aware mechanisms, NuggetIndex avoids the recall collapse commonly induced by conventional time-based filtering and substantially reduces generator input length. Experimental results demonstrate that NuggetIndex achieves a 42% improvement in nugget recall, a 9-percentage-point gain in temporal correctness, a 55% reduction in conflict rate, and a 64% decrease in input length across multiple benchmarks.
📝 Abstract
Retrieval-augmented generation (RAG) systems are frequently evaluated via fact-based metrics, yet standard implementations retrieve passages or static propositions. This unit mismatch between evaluation and retrieval objects hinders maintenance when corpora evolve and fails to capture superseded facts or source disagreements. We propose NuggetIndex, a retrieval system that stores atomic information units as managed records, so called nuggets. Each record maintains links to evidence, a temporal validity interval, and a lifecycle state. By filtering invalid or deprecated nuggets prior to ranking, the system prevents the inclusion of outdated information. We evaluate the approach using a nuggetized MS MARCO subset, a temporal Wikipedia QA dataset, and a multi-hop QA task. Against passage and unmanaged proposition retrieval baselines, NuggetIndex improves nugget recall by 42%, increases temporal correctness by 9 percentage points without the recall collapse observed in time-filtered baselines, and reduces conflict rates by 55%. The compact nugget format reduces generator input length by 64% while enabling lightweight index structures suitable for browser-based and resource-constrained deployment. We release our implementation, datasets, and evaluation scripts