Hierarchical Abstract Tree for Cross-Document Retrieval-Augmented Generation

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
Existing tree-based retrieval-augmented generation methods for cross-document multi-hop question answering suffer from poor distributional adaptability, structural isolation, and coarse abstraction granularity. This work proposes a hierarchical abstraction tree construction approach that requires no prior distributional assumptions, employing an iterative “merge-and-compress” strategy to adaptively model data distributions and incorporating an explicit cross-document linking mechanism. Furthermore, a multi-granularity retrieval agent is designed, integrating query reformulation with hybrid dense-sparse retrieval to jointly support token-level question answering and document-level summarization. Evaluated on standard benchmarks, the proposed method achieves a 25.9% F1 improvement over RAPTOR and a 7.4% gain over HippoRAG 2.
📝 Abstract
Retrieval-augmented generation (RAG) enhances large language models with external knowledge, and tree-based RAG organizes documents into hierarchical indexes to support queries at multiple granularities. However, existing Tree-RAG methods designed for single-document retrieval face critical challenges in scaling to cross-document multi-hop questions: (1) poor distribution adaptability, where $k$-means clustering introduces noise due to rigid distribution assumptions; (2) structural isolation, as tree indexes lack explicit cross-document connections; and (3) coarse abstraction, which obscures fine-grained details. To address these limitations, we propose $Ψ$-RAG, a tree-RAG framework with two key components. First, a hierarchical abstract tree index built through an iterative "merging and collapse" process that adapts to data distributions without a priori assumption. Second, a multi-granular retrieval agent that intelligently interacts with the knowledge base with reorganized queries and an agent-powered hybrid retriever. $Ψ$-RAG supports diverse tasks from token-level question answering to document-level summarization. On cross-document multi-hop QA benchmarks, it outperforms RAPTOR by 25.9% and HippoRAG 2 by 7.4% in average F1 score. Code is available at https://github.com/Newiz430/Psi-RAG.
Problem

Research questions and friction points this paper is trying to address.

cross-document retrieval
multi-hop question answering
tree-based RAG
distribution adaptability
structural isolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Abstract Tree
Cross-Document Retrieval
Multi-Granular Retrieval Agent
Distribution-Adaptive Clustering
Retrieval-Augmented Generation
🔎 Similar Papers
2024-05-26North American Chapter of the Association for Computational LinguisticsCitations: 31