🤖 AI Summary
This work addresses the prevalent issue in large language models (LLMs) where over 90% of errors in mathematical reasoning stem from reliance on superficial pattern matching rather than genuine understanding of logical relationships. To mitigate this, the authors propose the First-Step Logical Reasoning (FSLR) framework, which explicitly supervises the model to identify key variables and operations in a problem as the initial step of reasoning. This approach transforms implicit logical supervision into a lightweight and efficient training signal derived solely from the first reasoning step. Empirical results demonstrate that FSLR significantly enhances logical reasoning performance, yielding average accuracy improvements of 3.2% on in-distribution tasks and 4.6% on out-of-distribution tasks. Moreover, the method accelerates training by 4–6 times and reduces token consumption by more than 80%, offering both computational efficiency and improved generalization.
📝 Abstract
Recent studies reveal that large language models (LLMs) exhibit limited logical reasoning abilities in mathematical problem-solving, instead often relying on pattern-matching and memorization. We systematically analyze this limitation, focusing on logical relationship understanding, which is a core capability underlying genuine logical reasoning, and reveal that errors related to this capability account for over 90\% of incorrect predictions, with Chain-of-Thought Supervised Fine-Tuning (CoT-SFT) failing to substantially reduce these errors. To address this bottleneck, we propose First-Step Logical Reasoning (FSLR), a lightweight training framework targeting logical relationship understanding. Our key insight is that the first planning step-identifying which variables to use and which operation to apply-encourages the model to derive logical relationships directly from the problem statement. By training models on this isolated step, FSLR provides explicit supervision for logical relationship understanding, unlike CoT-SFT which implicitly embeds such relationships within complete solution trajectories. Extensive experiments across multiple models and datasets demonstrate that FSLR consistently outperforms CoT-SFT under both in-distribution and out-of-distribution settings, with average improvements of 3.2\% and 4.6\%, respectively. Moreover, FSLR achieves 4-6x faster training and reduces training token consumption by over 80\%.