Linear Reasoning vs. Proof by Cases: Obstacles for Large Language Models in FOL Problem Solving

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the notable weakness of current large language models (LLMs) in handling nonlinear logical reasoning tasks—particularly case-based proofs—compared to linear reasoning, and the absence of dedicated evaluation benchmarks. The study systematically distinguishes linear reasoning from case-based proof for the first time and introduces PC-FOL, the first high-quality dataset focused on case-based proofs in first-order logic, comprising formal statements and natural language proofs annotated by professional mathematicians. Through theoretical analysis grounded in graph model theory and extensive experiments with state-of-the-art LLMs, the paper reveals fundamental limitations in modeling complex logical structures, elucidates the root causes of performance bottlenecks, and establishes both a theoretical framework and an empirical foundation for advancing reasoning capabilities in future models.

Technology Category

Application Category

📝 Abstract

To comprehensively evaluate the mathematical reasoning capabilities of Large Language Models (LLMs), researchers have introduced abundant mathematical reasoning datasets. However, most existing datasets primarily focus on linear reasoning, neglecting other parts such as proof by contradiction and proof by cases, which are crucial for investigating LLMs'reasoning abilities. To address this limitation, we first introduce a novel first-order logic (FOL) dataset named PC-FOL, annotated by professional mathematicians, focusing on case-based reasoning problems. All instances in this dataset are equipped with a manually written natural language proof, clearly distinguishing it from conventional linear reasoning datasets. Our experimental results over leading LLMs demonstrate a substantial performance gap between linear reasoning and case-based reasoning problems. To further investigate this phenomenon, we provide a theoretical analysis grounded in graphical model, which provides an explanation for the observed disparity between the two types of reasoning problems. We hope this work can reveal the core challenges in the field of automated natural language mathematical proof generation, paving the way for future research.

Problem

Research questions and friction points this paper is trying to address.

case-based reasoning

linear reasoning

first-order logic

mathematical reasoning

large language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

case-based reasoning

first-order logic

mathematical reasoning