Knowledge Pyramid Construction for Multi-Level Retrieval-Augmented Generation

πŸ“… 2024-07-31
πŸ“ˆ Citations: 2
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing RAG methods prioritize recall at the expense of precision. To address this, we propose PolyRAGβ€”a novel framework that constructs a three-tier knowledge pyramid comprising ontology, knowledge graph, and text chunks. It employs a cascaded top-down retrieval strategy coupled with cross-layer semantic enhancement and dynamic filtering to jointly optimize both precision and recall. Key contributions include: (1) multi-granularity hierarchical knowledge modeling; (2) a dynamic ontology update mechanism; and (3) the first domain-specific retrieval benchmark for academia and finance. Extensive evaluation on this dual-domain benchmark demonstrates that PolyRAG consistently outperforms 19 state-of-the-art methods. Notably, under GPT-4, its F1 score improves from 0.1636 to 0.8109 (+395%), providing strong empirical validation of its effectiveness.

Technology Category

Application Category

πŸ“ Abstract
This paper addresses the need for improved precision in existing knowledge-enhanced question-answering frameworks, specifically Retrieval-Augmented Generation (RAG) methods that primarily focus on enhancing recall. We propose a multi-layer knowledge pyramid approach within the RAG framework to achieve a better balance between precision and recall. The knowledge pyramid consists of three layers: Ontologies, Knowledge Graphs (KGs), and chunk-based raw text. We employ cross-layer augmentation techniques for comprehensive knowledge coverage and dynamic updates of the Ontology schema and instances. To ensure compactness, we utilize cross-layer filtering methods for knowledge condensation in KGs. Our approach, named PolyRAG, follows a waterfall model for retrieval, starting from the top of the pyramid and progressing down until a confident answer is obtained. We introduce two benchmarks for domain-specific knowledge retrieval, one in the academic domain and the other in the financial domain. The effectiveness of the methods has been validated through comprehensive experiments by outperforming 19 SOTA methods. An encouraging observation is that the proposed method has augmented the GPT-4, providing 395% F1 gain by improving its performance from 0.1636 to 0.8109.
Problem

Research questions and friction points this paper is trying to address.

Enhance precision in RAG frameworks
Balance precision and recall
Dynamic updates for knowledge bases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-layer knowledge pyramid
Cross-layer augmentation techniques
Waterfall retrieval model
πŸ”Ž Similar Papers
No similar papers found.