🤖 AI Summary
This work addresses the efficiency and scalability challenges posed by the irregular structure of knowledge graphs in multi-hop compositional query answering. The authors propose SeedER, a novel framework that uniquely integrates a seed-expansion mechanism with reinforcement learning. It first generates compact sets of core seed entities through lightweight dense and sparse retrieval, then iteratively expands these seeds via a graph-aware policy, decomposing global reasoning into reusable local decisions at low computational cost. By maintaining concise candidate sets while substantially improving recall, SeedER outperforms strong baselines and functions as an efficient single-stage retriever. The approach also offers theoretical advantages in compositional generalization and submodular optimization under graph constraints.
📝 Abstract
Knowledge graphs (KGs) offer a rich representation for relational knowledge, but their irregular structure makes retrieval challenging: ego-graph expansion grows rapidly, and dense embedding methods struggle with multi-hop compositional queries. Existing agent-based graph exploration approaches, while expressive, are often too expensive for large-scale retrieval. We introduce SeedER (Seed-and-Expand Retrieval), a retrieval framework that explicitly leverages KG structure through iterative, low-cost expansion. SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost. We show theoretical limitations of dense retrieval on compositional graph queries, and establish advantages of SeedER from both compositional generalization and graph-constrained submodular optimization perspectives. Empirically, SeedER substantially improves recall with compact candidate sets over strong dense and graph-augmented baselines, making it an effective first-stage retriever for knowledge-intensive reasoning systems.