Reasoning Effort and Problem Complexity: A Scaling Analysis in LLMs

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how large language models’ (LLMs) inference effort scales with problem complexity, exposing fundamental limitations in logical consistency. Method: We design a controllable benchmark using the Tents puzzle—a combinatorial logic task with known linear-time solvability and arbitrarily scalable complexity—and employ multi-scale prompting alongside fine-grained analysis of reasoning traces to quantify inference effort via token consumption, step count, and retry rate. Contribution/Results: We report the first empirical evidence of non-monotonic scaling: inference effort saturates—and even declines—beyond a critical complexity threshold, contradicting the implicit assumption that harder problems invariably demand more reasoning. Moreover, mainstream reasoning models exhibit pronounced performance divergence under high complexity. These findings challenge prevailing intuitions about LLM reasoning scalability and establish a novel, theoretically grounded benchmark for evaluating and modeling the limits of LLM inference capacity.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable text generation capabilities, and recent advances in training paradigms have led to breakthroughs in their reasoning performance. In this work, we investigate how the reasoning effort of such models scales with problem complexity. We use the infinitely scalable Tents puzzle, which has a known linear-time solution, to analyze this scaling behavior. Our results show that reasoning effort scales with problem size, but only up to a critical problem complexity. Beyond this threshold, the reasoning effort does not continue to increase, and may even decrease. This observation highlights a critical limitation in the logical coherence of current LLMs as problem complexity increases, and underscores the need for strategies to improve reasoning scalability. Furthermore, our results reveal significant performance differences between current state-of-the-art reasoning models when faced with increasingly complex logical puzzles.
Problem

Research questions and friction points this paper is trying to address.

Analyzes scaling of reasoning effort with problem complexity in LLMs.
Identifies critical complexity threshold beyond which reasoning effort plateaus.
Highlights performance differences among state-of-the-art reasoning models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes reasoning effort scaling with complexity
Uses Tents puzzle for linear-time solution analysis
Identifies critical complexity threshold in LLMs
🔎 Similar Papers
No similar papers found.