🤖 AI Summary
This work addresses the limitations of existing retrieval-augmented generation (RAG) systems in handling semi-structured corpora, particularly their inability to support precise filtering, aggregation, and joint retrieval over structured attributes across multiple documents. To overcome these challenges, the authors propose DualGraph, a novel framework that uniquely integrates semantic and symbolic perspectives. It leverages a textual knowledge graph for dense semantic retrieval while employing a symbolic knowledge graph to enable interpretable, type-constrained triple-based queries. A multi-strategy evidence fusion mechanism effectively combines insights from both modalities. DualGraph establishes a new paradigm for semi-structured question answering, significantly outperforming dense retrieval, GraphRAG, purely symbolic approaches, and table-oriented baselines on the newly introduced SpecsQA benchmark, demonstrating strong performance on both open-ended and specification-oriented questions.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems for question answering typically retrieve evidence by semantic similarity between the query and document chunks. While effective for unstructured text, this approach is less reliable on semi-structured corpora where answering may require exact filtering, aggregation, or exhaustive retrieval over structured attributes across multiple documents. Symbolic approaches support such operations, but they are often brittle on noisy natural-language corpora. We address this gap with DualGraph, a RAG framework that represents documents through two complementary views: a Textual Knowledge Graph for semantic retrieval and a Symbolic Knowledge Graph for symbolic querying over typed subject--predicate--object triples. Building on these two components, we provide multiple strategies for selecting or combining semantic and symbolic evidence.We also introduce SpecsQA, a benchmark from a commercial shopping website with semi-structured product documents and manually curated questions spanning open-ended and specification-oriented retrieval. Experiments show that DualGraph consistently outperforms state-of-the-art dense-retrieval, GraphRAG, symbolic, and table-oriented baselines across question types.Code and data are available at https://github.com/corneliocristina/DualGraphRAG.