ToM: Leveraging Tree-oriented MapReduce for Long-Context Reasoning in Large Language Models

📅 2025-11-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Limited context windows in large language models (LLMs) degrade long-text reasoning performance. Existing retrieval-augmented generation (RAG) and divide-and-conquer (DCF) approaches struggle to preserve logical coherence and model long-range dependencies simultaneously. This paper proposes DocTree, a document-structure-aware hierarchical MapReduce framework. DocTree first constructs a semantic document tree (DocTree) via hierarchical semantic parsing; it then recursively applies Map operations—where leaf nodes generate local reasoning chains—and Reduce operations—where internal nodes aggregate logically consistent conclusions from children. Crucially, DocTree explicitly incorporates document hierarchy into the reasoning process, mitigating logical conflicts and dependency fragmentation inherent in conventional DCF. Experiments on models ≥70B parameters demonstrate that DocTree significantly improves logical consistency and accuracy in long-text reasoning, outperforming state-of-the-art RAG and DCF methods. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs), constrained by limited context windows, often face significant performance degradation when reasoning over long contexts. To address this, Retrieval-Augmented Generation (RAG) retrieves and reasons over chunks but frequently sacrifices logical coherence due to its reliance on similarity-based rankings. Similarly, divide-and-conquer frameworks (DCF) split documents into small chunks for independent reasoning and aggregation. While effective for local reasoning, DCF struggles to capture long-range dependencies and risks inducing conflicts by processing chunks in isolation. To overcome these limitations, we propose ToM, a novel Tree-oriented MapReduce framework for long-context reasoning. ToM leverages the inherent hierarchical structure of long documents (e.g., main headings and subheadings) by constructing a DocTree through hierarchical semantic parsing and performing bottom-up aggregation. Using a Tree MapReduce approach, ToM enables recursive reasoning: in the Map step, rationales are generated at child nodes; in the Reduce step, these rationales are aggregated across sibling nodes to resolve conflicts or reach consensus at parent nodes. Experimental results on 70B+ LLMs show that ToM significantly outperforms existing divide-and-conquer frameworks and retrieval-augmented generation methods, achieving better logical coherence and long-context reasoning. Our code is available at https://github.com/gjn12-31/ToM .
Problem

Research questions and friction points this paper is trying to address.

Addresses performance degradation in LLMs with limited context windows
Overcomes logical coherence loss in similarity-based retrieval methods
Solves long-range dependency capture in isolated chunk processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Tree MapReduce enables recursive reasoning in LLMs
Hierarchical semantic parsing constructs document tree structures
Bottom-up aggregation resolves conflicts across sibling nodes
🔎 Similar Papers
No similar papers found.
J
Jiani Guo
School of Computer Science, Wuhan University, Wuhan, China
Zuchao Li
Zuchao Li
Wuhan University
Natural Language ProcessingMachine Learning
J
Jie Wu
Tsinghua University
Q
Qianren Wang
Shanghai Huawei Technologies, China
Y
Yun Li
Cognitive AI Lab
Lefei Zhang
Lefei Zhang
School of Computer Science, Wuhan University
Pattern RecognitionMachine LearningImage ProcessingRemote Sensing
H
Hai Zhao
School of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Yujiu Yang
Yujiu Yang
SIGS, Tsinghua University
Machine Learning, Nature language processing, Computer vision