OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

Real-world knowledge is fragmented across heterogeneous sources such as text corpora, relational tables, and knowledge graphs, yet existing retrieval systems support only a single source type and a fixed query language, exacerbating knowledge fragmentation. This work proposes a unified retrieval framework that leverages natural language understanding to automatically identify the most suitable knowledge sources for a given query and dynamically translates the query into the native language of each source, which is then executed by the corresponding backend engine. By preserving the structural characteristics of individual sources while enabling efficient cross-source coordination, the approach departs from conventional homogeneous fusion paradigms. Evaluated on a large-scale benchmark encompassing 13 datasets and 309 knowledge bases, the method significantly outperforms single-source retrieval baselines, demonstrating its effectiveness and generality as a universal interface for heterogeneous knowledge retrieval.

📝 Abstract

Real-world information needs require access to structurally diverse knowledge sources, from unstructured text and relational tables to knowledge graphs and property graphs. Existing retrievers, however, operate over one source at a time under a fixed query language, leaving the broader landscape of available knowledge fragmented behind incompatible interfaces. A natural attempt at unification would collapse these sources into a shared space, but this erases the structural affordances (such as schemas, ontologies, compositional operators) that give each source its expressive power. Effective retrieval over diverse knowledge, therefore, requires not homogenization but an overarching layer that meets each source on its own terms. To achieve this, we present OmniRetrieval, a framework that takes any natural-language query, identifies appropriate knowledge sources, and dispatches source-native queries to their native execution engines. Across an extensive benchmark spanning 13 datasets and 309 distinct knowledge bases over text, relational, and graph-structured sources, OmniRetrieval exceeds single-source baselines, demonstrating that it can serve as a general-purpose interface to the heterogeneous sources while preserving the structural distinctions that make each source valuable.

Problem

Research questions and friction points this paper is trying to address.

heterogeneous knowledge sources

unified retrieval

structural affordances

knowledge fragmentation

cross-source querying

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified retrieval

heterogeneous knowledge sources

source-native querying