Improving LLMs' Generalized Reasoning Abilities by Graph Problems

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited generalization on complex reasoning tasks—particularly in mathematics, logic, and commonsense reasoning—due to insufficient modeling of abstract relational structures. Method: This paper introduces Graph Problem Reasoning (GPR), a novel paradigm that systematically encodes diverse abstract reasoning patterns—including pathfinding, topological analysis, and numerical derivation—using graph-structured representations. To support GPR, we construct GraphPile, the first large-scale graph reasoning pretraining dataset (23 task categories, 1.09B tokens), and propose a domain-specific continual pretraining (CPT) framework integrating chain-of-thought prompting, programmatic reasoning, execution-trace supervision, and real-world graph data. Contribution/Results: Applying CPT to base models including Llama 3/3.1 and Gemma 2 yields up to +4.9% accuracy gain on mathematical reasoning and +21.2% on logical and commonsense reasoning benchmarks, demonstrating substantial improvements in cross-domain adaptability and robustness.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have made remarkable strides in reasoning tasks, yet their performance often falters on novel and complex problems. Domain-specific continued pretraining (CPT) methods, such as those tailored for mathematical reasoning, have shown promise but lack transferability to broader reasoning tasks. In this work, we pioneer the use of Graph Problem Reasoning (GPR) to enhance the general reasoning capabilities of LLMs. GPR tasks, spanning pathfinding, network analysis, numerical computation, and topological reasoning, require sophisticated logical and relational reasoning, making them ideal for teaching diverse reasoning patterns. To achieve this, we introduce GraphPile, the first large-scale corpus specifically designed for CPT using GPR data. Spanning 10.9 billion tokens across 23 graph tasks, the dataset includes chain-of-thought, program-of-thought, trace of execution, and real-world graph data. Using GraphPile, we train GraphMind on popular base models Llama 3 and 3.1, as well as Gemma 2, achieving up to 4.9 percent higher accuracy in mathematical reasoning and up to 21.2 percent improvement in non-mathematical reasoning tasks such as logical and commonsense reasoning. By being the first to harness GPR for enhancing reasoning patterns and introducing the first dataset of its kind, our work bridges the gap between domain-specific pretraining and universal reasoning capabilities, advancing the adaptability and robustness of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' general reasoning via graph problem tasks
Addressing lack of transferability in domain-specific pretraining methods
Bridging gap between specialized and universal reasoning capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Graph Problem Reasoning (GPR) for LLMs
Introduces GraphPile, a large-scale GPR corpus
Trains models with diverse reasoning patterns
🔎 Similar Papers
No similar papers found.
Q
Qifan Zhang
The Hong Kong University of Science and Technology (Guangzhou)
N
Nuo Chen
The Hong Kong University of Science and Technology (Guangzhou)
Z
Zehua Li
The Hong Kong University of Science and Technology (Guangzhou)
Miao Peng
Miao Peng
The Hong Kong University of Science and Technology (Guangzhou)
Knowledge GraphNatural Language Processing
J
Jing Tang
The Hong Kong University of Science and Technology (Guangzhou)
J
Jia Li
The Hong Kong University of Science and Technology (Guangzhou)