A Fano-Style Accuracy Upper Bound for LLM Single-Pass Reasoning in Multi-Hop QA

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

In multi-hop question answering (MHQA), large language models (LLMs) suffer from limited single-turn output capacity, hindering reliable integration of dispersed, interdependent evidence under noisy conditions and thereby degrading single-step reasoning accuracy. To address this, we first derive a capacity-aware theoretical accuracy upper bound grounded in Fano’s inequality, formally exposing the fundamental tension between task complexity and model capacity. Building on this insight, we propose InfoQA—a multi-call reasoning framework that decomposes tasks, explicitly models inter-evidence dependencies, and actively prunes reasoning trajectories to ensure stability and robustness in high-noise settings. Experiments demonstrate that the theoretically derived capacity curve closely aligns with empirical performance; InfoQA achieves significant accuracy gains across multiple high-noise MHQA benchmarks, while exhibiting strong robustness and scalability.

Technology Category

Application Category

📝 Abstract

Multi-Hop Question Answering (MHQA) requires integrating dispersed, interdependent evidence through sequential reasoning under noise. This task is challenging for LLMs as they have a finite per-pass output capacity, beyond which the integration of task-relevant evidence proves unreliable. Consequently, the single-pass reasoning paradigm is inherently vulnerable to this capacity overflow. To formalize this bottleneck, our analysis establishes a Fano-style accuracy upper bound, defining a theoretical performance ceiling for single-pass LLMs. This bound reveals that accuracy inevitably collapses once task complexity exceeds model capacity, providing general principles for capacity-aware representation and structuring of MHQA in LLMs. Building on these principles, we introduce a proof-of-concept multi-call framework for MHQA, InfoQA. It ensures high per-step accuracy by combining capacity-aware task decomposition with active pruning of prior reasoning traces, keeping the information load within the single-pass limit. It further achieves robustness by a dependency-explicit workflow that enables precise control over the reasoning path. We construct a stringent and noise-rich benchmark to validate our theory and framework. Experimental results show that model behavior aligns with our predicted capacity curves while InfoQA achieves consistent performance improvements. We hope our work inspires more LLM multi-step reasoning methods: faGithub href{https://github.com/KaiyangWan/InfoQA}{InfoQA}.

Problem

Research questions and friction points this paper is trying to address.

Addressing LLM capacity overflow in multi-hop QA reasoning

Establishing theoretical accuracy bound for single-pass reasoning

Developing capacity-aware framework for reliable evidence integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fano-style accuracy upper bound formalization

Capacity-aware task decomposition with pruning

Dependency-explicit workflow for path control

🔎 Similar Papers

Do Large Language Models Latently Perform Multi-Hop Reasoning?