Lost in Transmission: When and Why LLMs Fail to Reason Globally

📅 2025-05-13

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Large language models (LLMs) fail on long-horizon, complex reasoning due to a fundamental bandwidth bottleneck in the attention mechanism, which restricts global information flow and limits holistic reasoning capacity. Method: The authors propose the Bandwidth-Constrained Attention Prefix Oracle (BAPO), a formal theoretical model that characterizes this capacity limit and defines BAPO-hard reasoning problems—those intractable under constrained attention bandwidth. Contribution/Results: They prove theoretically that chain-of-thought (CoT) reasoning transforms BAPO-hard problems into BAPO-easy ones, providing the first principled explanation for CoT’s efficacy. Through computational complexity analysis, cross-model empirical evaluation (GPT-4, Claude, Gemini), and graph-reachability benchmarks, they demonstrate that mainstream LLMs systematically fail on BAPO-hard tasks, with performance degradation strictly correlated with attention bandwidth. These findings establish attention bandwidth as the core bottleneck for global reasoning and reveal CoT as an inherently communication-aware decomposition strategy.

Technology Category

Application Category

📝 Abstract

Despite their many successes, transformer-based large language models (LLMs) continue to struggle with tasks that require complex reasoning over large parts of their input. We argue that these failures arise due to capacity limits on the accurate flow of information within LLMs. To formalize this issue, we introduce the bounded attention prefix oracle (BAPO) model, a new computational framework that models bandwidth constraints on attention heads, the mechanism for internal communication in LLMs. We show that several important reasoning problems like graph reachability require high communication bandwidth for BAPOs to solve; we call these problems BAPO-hard. Our experiments corroborate our theoretical predictions: GPT-4, Claude, and Gemini succeed on BAPO-easy tasks and fail even on relatively small BAPO-hard tasks. BAPOs also reveal another benefit of chain of thought (CoT): we prove that breaking down a task using CoT can turn any BAPO-hard problem into a BAPO-easy one. Our results offer principled explanations for key LLM failures and suggest directions for architectures and inference methods that mitigate bandwidth limits.

Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with complex reasoning over large inputs

Capacity limits cause information flow failures in LLMs

BAPO model explains bandwidth constraints in attention heads

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces bounded attention prefix oracle (BAPO) model

Identifies BAPO-hard reasoning problems

Proves chain of thought mitigates bandwidth limits

🔎 Similar Papers

No similar papers found.