Improving LLMs' Generalized Reasoning Abilities by Graph Problems

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited generalization on complex reasoning tasks—particularly in mathematics, logic, and commonsense reasoning—due to insufficient modeling of abstract relational structures. Method: This paper introduces Graph Problem Reasoning (GPR), a novel paradigm that systematically encodes diverse abstract reasoning patterns—including pathfinding, topological analysis, and numerical derivation—using graph-structured representations. To support GPR, we construct GraphPile, the first large-scale graph reasoning pretraining dataset (23 task categories, 1.09B tokens), and propose a domain-specific continual pretraining (CPT) framework integrating chain-of-thought prompting, programmatic reasoning, execution-trace supervision, and real-world graph data. Contribution/Results: Applying CPT to base models including Llama 3/3.1 and Gemma 2 yields up to +4.9% accuracy gain on mathematical reasoning and +21.2% on logical and commonsense reasoning benchmarks, demonstrating substantial improvements in cross-domain adaptability and robustness.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have made remarkable strides in reasoning tasks, yet their performance often falters on novel and complex problems. Domain-specific continued pretraining (CPT) methods, such as those tailored for mathematical reasoning, have shown promise but lack transferability to broader reasoning tasks. In this work, we pioneer the use of Graph Problem Reasoning (GPR) to enhance the general reasoning capabilities of LLMs. GPR tasks, spanning pathfinding, network analysis, numerical computation, and topological reasoning, require sophisticated logical and relational reasoning, making them ideal for teaching diverse reasoning patterns. To achieve this, we introduce GraphPile, the first large-scale corpus specifically designed for CPT using GPR data. Spanning 10.9 billion tokens across 23 graph tasks, the dataset includes chain-of-thought, program-of-thought, trace of execution, and real-world graph data. Using GraphPile, we train GraphMind on popular base models Llama 3 and 3.1, as well as Gemma 2, achieving up to 4.9 percent higher accuracy in mathematical reasoning and up to 21.2 percent improvement in non-mathematical reasoning tasks such as logical and commonsense reasoning. By being the first to harness GPR for enhancing reasoning patterns and introducing the first dataset of its kind, our work bridges the gap between domain-specific pretraining and universal reasoning capabilities, advancing the adaptability and robustness of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' general reasoning via graph problem tasks

Addressing lack of transferability in domain-specific pretraining methods

Bridging gap between specialized and universal reasoning capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Graph Problem Reasoning (GPR) for LLMs

Introduces GraphPile, a large-scale GPR corpus

Trains models with diverse reasoning patterns

🔎 Similar Papers

No similar papers found.