🤖 AI Summary
This work addresses the hallucination problem in large language models when performing knowledge-intensive tasks over black-box knowledge graphs, which stems from insufficient recall and precision in retrieval. The paper formally defines the Optimal Informative Subgraph Retrieval (OISR) problem for the first time and establishes its computational complexity. To tackle this challenge, the authors propose BubbleRAG, a training-free framework that enhances both recall and precision simultaneously by grouping semantic anchors, heuristically expanding them into candidate evidence subgraphs, and integrating composite ranking with reasoning-aware expansion. Evaluated on multi-hop question answering benchmarks, BubbleRAG achieves state-of-the-art performance, outperforming strong baselines in both F1 score and accuracy, while offering a plug-and-play capability without requiring model retraining.
📝 Abstract
Large Language Models (LLMs) exhibit hallucinations in knowledge-intensive tasks. Graph-based retrieval augmented generation (RAG) has emerged as a promising solution, yet existing approaches suffer from fundamental recall and precision limitations when operating over black-box knowledge graphs -- graphs whose schema and structure are unknown in advance. We identify three core challenges that cause recall loss (semantic instantiation uncertainty and structural path uncertainty) and precision loss (evidential comparison uncertainty). To address these challenges, we formalize the retrieval task as the Optimal Informative Subgraph Retrieval (OISR) problem -- a variant of Group Steiner Tree -- and prove it to be NP-hard and APX-hard. We propose BubbleRAG, a training-free pipeline that systematically optimizes for both recall and precision through semantic anchor grouping, heuristic bubble expansion to discover candidate evidence graphs (CEGs), composite ranking, and reasoning-aware expansion. Experiments on multi-hop QA benchmarks demonstrate that BubbleRAG achieves state-of-the-art results, outperforming strong baselines in both F1 and accuracy while remaining plug-and-play.