🤖 AI Summary
Large reasoning models (LRMs) frequently exhibit counterintuitive reasoning behaviors, lacking verifiable theoretical foundations. To address this, we propose the *Laws of Reasoning* (LoRe) framework—the first to formally axiomatize two intrinsic reasoning constraints: the *Computational Linearity Law* and the *Accuracy Law*. We introduce LoRe-Bench, a diagnostic benchmark that quantifies reasoning quality via monotonicity and compositionality. Furthermore, we design law-constrained fine-tuning, demonstrating that synergistic enforcement of multiple laws enhances generalization. Our key contributions are: (1) the first empirically verifiable formal system of intrinsic reasoning laws; (2) the identification of compositionality deficiency as a fundamental performance bottleneck; and (3) consistent, significant improvements across diverse benchmarks—including GSM8K, MMLU, and BBH—establishing a new paradigm for interpretable and controllable reasoning.
📝 Abstract
Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formalize the desired reasoning behaviors, this paper presents the Laws of Reasoning (LoRe), a unified framework that characterizes intrinsic reasoning patterns in LRMs. We first propose compute law with the hypothesis that the reasoning compute should scale linearly with question complexity. Beyond compute, we extend LoRe with a supplementary accuracy law. Since the question complexity is difficult to quantify in practice, we examine these hypotheses by two properties of the laws, monotonicity and compositionality. We therefore introduce LoRe-Bench, a benchmark that systematically measures these two tractable properties for large reasoning models. Evaluation shows that most reasoning models exhibit reasonable monotonicity but lack compositionality. In response, we develop an effective finetuning approach that enforces compute-law compositionality. Extensive empirical studies demonstrate that better compliance with compute laws yields consistently improved reasoning performance on multiple benchmarks, and uncovers synergistic effects across properties and laws. Project page: https://lore-project.github.io/