🤖 AI Summary
To address incomplete retrieval and weak relational awareness—leading to fragmented answers—in Retrieval-Augmented Generation (RAG) applied to high-volume, low-information-density data, this paper proposes the Pseudo-Knowledge Graph (PKG) framework. PKG eschews explicit triple extraction; instead, it leverages meta-paths to guide multi-hop semantic retrieval and natively embeds raw text into a graph structure. It jointly employs intra-graph textual modeling and hybrid vector retrieval to construct relation-enhanced contexts. Its core innovation lies in the first-of-its-kind dual-mechanism design: meta-path-driven navigation coupled with native text preservation—enabling implicit entity relationship modeling without additional annotation cost. Evaluated on Open Compass and MultiHop-RAG benchmarks, PKG achieves significant improvements in multi-hop question answering accuracy and long-tail fact recall, demonstrating its effectiveness for high-precision relational reasoning over massive, sparse datasets.
📝 Abstract
The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.