HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation

📅 2026-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of conventional Euclidean dense retrievers in modeling the hierarchical structure of natural language, which often leads to the retrieval of semantically irrelevant documents and exacerbates hallucination in generation. To overcome this, the authors introduce hyperbolic geometry into RAG retrievers for the first time, proposing a fully hyperbolic Transformer (HyTE-FH) and a hybrid architecture (HyTE-H). They design an Outward Einstein Midpoint pooling operator to preserve hierarchical relationships, employ Lorentz model embeddings with radial norms to explicitly encode document specificity, and incorporate geometric-aware aggregation to prevent representation collapse. Evaluated on MTEB, their approach outperforms Euclidean baselines, and on RAGBench, it achieves up to a 29% improvement in context-answer relevance with a smaller model size, while exhibiting over 20% radial separation in document specificity.

Technology Category

Application Category

📝 Abstract
Embedding geometry plays a fundamental role in retrieval quality, yet dense retrievers for retrieval-augmented generation (RAG) remain largely confined to Euclidean space. However, natural language exhibits hierarchical structure from broad topics to specific entities that Euclidean embeddings fail to preserve, causing semantically distant documents to appear spuriously similar and increasing hallucination risk. To address these limitations, we introduce hyperbolic dense retrieval, developing two model variants in the Lorentz model of hyperbolic space: HyTE-FH, a fully hyperbolic transformer, and HyTE-H, a hybrid architecture projecting pre-trained Euclidean embeddings into hyperbolic space. To prevent representational collapse during sequence aggregation, we introduce the Outward Einstein Midpoint, a geometry-aware pooling operator that provably preserves hierarchical structure. On MTEB, HyTE-FH outperforms equivalent Euclidean baselines, while on RAGBench, HyTE-H achieves up to 29% gains over Euclidean baselines in context relevance and answer relevance using substantially smaller models than current state-of-the-art retrievers. Our analysis also reveals that hyperbolic representations encode document specificity through norm-based separation, with over 20% radial increase from general to specific concepts, a property absent in Euclidean embeddings, underscoring the critical role of geometric inductive bias in faithful RAG systems.
Problem

Research questions and friction points this paper is trying to address.

dense retrieval
retrieval-augmented generation
hyperbolic geometry
hierarchical structure
embedding geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

hyperbolic geometry
dense retrieval
retrieval-augmented generation
hierarchical representation
Einstein midpoint
🔎 Similar Papers
No similar papers found.