When Does Divide and Conquer Work for Long Context LLM? A Noise Decomposition Framework

📅 2025-06-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from performance degradation when processing ultra-long texts. Method: This paper proposes a noise-driven failure mechanism analysis framework, formally characterizing three critical noise sources: missing cross-chunk dependencies (task noise), internal model confusion induced by context expansion (model noise, superlinearly increasing), and inaccurate integration of chunked outputs (aggregation noise). Based on this framework, we rigorously define the effective boundary of multi-agent divide-and-conquer processing and design cross-chunk dependency modeling and dynamic aggregation strategies. Contribution/Results: Experiments on long-text QA, retrieval, and summarization demonstrate that lightweight models—equipped with optimized chunking and aggregation—significantly outperform GPT-4o’s single-pass long-context inference. We establish, for the first time, a theoretically grounded noise taxonomy for long-context failure, providing a generalizable pathway for co-optimizing chunking and aggregation mechanisms.

Technology Category

Application Category

📝 Abstract
We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is effective to use multi-agent chunking, i.e., dividing a length sequence into smaller chunks and aggregating the processed results of each chunk. Our experiments on tasks such as retrieval, question answering, and summarization confirm both the theoretical analysis and the conditions that favor multi-agent chunking. By exploring superlinear model noise growth with input length, we also explain why, for large inputs, a weaker model configured with chunk-based processing can surpass a more advanced model like GPT4o applied in a single shot. Overall, we present a principled understanding framework and our results highlight a direct pathway to handling long contexts in LLMs with carefully managed chunking and aggregator strategies.
Problem

Research questions and friction points this paper is trying to address.

Analyzing failure modes in long context LLM tasks
Evaluating effectiveness of multi-agent chunking strategy
Explaining weaker models outperforming advanced ones with chunking
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise decomposition framework for long context tasks
Multi-agent chunking for processing long sequences
Chunk-based strategies outperform single-shot advanced models
🔎 Similar Papers
No similar papers found.
Z
Zhen Xu
University of Chicago
S
Shang Zhu
Together AI
J
Jue Wang
Together AI
Junlin Wang
Junlin Wang
Duke University
Computer ScienceNLP
Ben Athiwaratkun
Ben Athiwaratkun
Together AI
Artificial Intelligence
C
Chi Wang
Google DeepMind
James Zou
James Zou
Stanford University
Machine learningcomputational biologycomputational healthstatisticsbiotech
C
Ce Zhang
Together AI