🤖 AI Summary
This work addresses the limitations of current large language models (LLMs) in open-domain question answering, where reliance on linear reasoning hinders the parallel integration of multiple premises and sub-questions, often resulting in logical inconsistencies. To overcome this, we propose the Self-Graph Reasoning (SGR) framework—the first approach enabling general-purpose LLMs to autonomously construct graph-structured reasoning paths and explicitly articulate their reasoning process prior to answer generation. SGR incorporates graph-based modeling, a multi-candidate reasoning graph fusion strategy, and a specialized training dataset for fine-tuning. Experimental results demonstrate that SGR achieves an average improvement of 17.74% across five question-answering benchmarks, elevating the performance of LLaMA-3.3-70B to match that of GPT-4o and surpass Claude-3.5-Haiku.
📝 Abstract
Large Language Models (LLMs) show strong reasoning ability in open-domain question answering, yet their reasoning processes are typically linear and often logically inconsistent. In contrast, real-world reasoning requires integrating multiple premises and solving subproblems in parallel. Existing methods, such as Chain-of-Thought (CoT), express reasoning in a linear textual form, which may appear coherent but frequently leads to inconsistent conclusions. Recent approaches rely on externally provided graphs and do not explore how LLMs can construct and use their own graph-structured reasoning, particularly in open-domain QA. To fill this gap, we novelly explore graph-structured reasoning of LLMs in general-domain question answering. We propose Self-Graph Reasoning (SGR), a framework that enables LLMs to explicitly represent their reasoning process as a structured graph before producing the final answer. We further construct a graph-structured reasoning dataset that merges multiple candidate reasoning graphs into refined graph structures for model training. Experiments on five QA benchmarks across both general and specialized domains show that SGR consistently improves reasoning consistency and yields a 17.74% gain over the base model. The LLaMA-3.3-70B model fine-tuned with SGR performs comparably to GPT-4o and surpasses Claude-3.5-Haiku, demonstrating the effectiveness of graph-structured reasoning.