VersionRAG: Version-Aware Retrieval-Augmented Generation for Evolving Documents

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address the limitation of conventional RAG systems in accurately answering version-sensitive queries over evolving technical documentation—primarily due to neglecting temporal validity of documents—this paper proposes the first version-aware RAG framework. Methodologically, it constructs a hierarchical version graph to model document evolution, employs a query-driven intent classifier for adaptive path selection, and integrates temporal-consistent retrieval with change-tracking mechanisms to detect both explicit and implicit modifications. A lightweight indexing strategy reduces indexing overhead by 97%. Evaluated on the VersionQA benchmark, the framework achieves 90% question-answering accuracy—significantly outperforming baselines—and attains 60% accuracy in implicit change detection (baseline ≈ 0%), enabling, for the first time, effective identification of undocumented semantic drift.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) systems fail when documents evolve through versioning-a ubiquitous characteristic of technical documentation. Existing approaches achieve only 58-64% accuracy on version-sensitive questions, retrieving semantically similar content without temporal validity checks. We present VersionRAG, a version-aware RAG framework that explicitly models document evolution through a hierarchical graph structure capturing version sequences, content boundaries, and changes between document states. During retrieval, VersionRAG routes queries through specialized paths based on intent classification, enabling precise version-aware filtering and change tracking. On our VersionQA benchmark-100 manually curated questions across 34 versioned technical documents-VersionRAG achieves 90% accuracy, outperforming naive RAG (58%) and GraphRAG (64%). VersionRAG reaches 60% accuracy on implicit change detection where baselines fail (0-10%), demonstrating its ability to track undocumented modifications. Additionally, VersionRAG requires 97% fewer tokens during indexing than GraphRAG, making it practical for large-scale deployment. Our work establishes versioned document QA as a distinct task and provides both a solution and benchmark for future research.

Problem

Research questions and friction points this paper is trying to address.

Addresses RAG failure in versioned document evolution

Solves temporal validity gaps in retrieval-augmented generation

Enables version-aware filtering for evolving technical documentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Models document evolution via hierarchical graph structure

Routes queries through intent-based specialized paths

Achieves high accuracy with reduced token requirements

🔎 Similar Papers

Creating a Taxonomy for Retrieval Augmented Generation Applications