SAGE: Structure Aware Graph Expansion for Retrieval of Heterogeneous Data

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of multi-hop question answering over heterogeneous data—encompassing text, tables, and graph nodes—where existing approaches either rely on costly knowledge graphs or flatten structured data during retrieval, thereby discarding valuable structural cues. To overcome these limitations, the authors propose SAGE, a novel framework that enables structure-aware graph-augmented retrieval without requiring a complete knowledge graph. SAGE constructs a block-level graph offline based on metadata similarity and, at query time, expands one-hop neighbors around seed blocks while integrating dense and sparse retrieval for context filtering. By effectively combining implicit and explicit structural information, SAGE facilitates cross-modal multi-hop reasoning. Experiments on OTT-QA and STaRK demonstrate substantial improvements, with retrieval recall increasing by 5.7 and 8.5 percentage points, respectively, significantly outperforming current baselines.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented question answering over heterogeneous corpora requires connected evidence across text, tables, and graph nodes. While entity-level knowledge graphs support structured access, they are costly to construct and maintain, and inefficient to traverse at query time. In contrast, standard retriever-reader pipelines use flat similarity search over independently chunked text, missing multi-hop evidence chains across modalities. We propose SAGE (Structure Aware Graph Expansion) framework that (i) constructs a chunk-level graph offline using metadata-driven similarities with percentile-based pruning, and (ii) performs online retrieval by running an initial baseline retriever to obtain k seed chunks, expanding first-hop neighbors, and then filtering the neighbors using dense+sparse retrieval, selecting k' additional chunks. We instantiate the initial retriever using hybrid dense+sparse retrieval for implicit cross-modal corpora and SPARK (Structure Aware Planning Agent for Retrieval over Knowledge Graphs) an agentic retriever for explicit schema graphs. On OTT-QA and STaRK, SAGE improves retrieval recall by 5.7 and 8.5 points over baselines.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous data
retrieval-augmented question answering
multi-hop evidence
knowledge graph
cross-modal retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Structure Aware Graph Expansion
chunk-level graph
hybrid dense+sparse retrieval
multi-hop evidence retrieval
heterogeneous data retrieval
🔎 Similar Papers
No similar papers found.