🤖 AI Summary
Large language models (LLMs) suffer from performance degradation when processing ultra-long texts. Method: This paper proposes a noise-driven failure mechanism analysis framework, formally characterizing three critical noise sources: missing cross-chunk dependencies (task noise), internal model confusion induced by context expansion (model noise, superlinearly increasing), and inaccurate integration of chunked outputs (aggregation noise). Based on this framework, we rigorously define the effective boundary of multi-agent divide-and-conquer processing and design cross-chunk dependency modeling and dynamic aggregation strategies. Contribution/Results: Experiments on long-text QA, retrieval, and summarization demonstrate that lightweight models—equipped with optimized chunking and aggregation—significantly outperform GPT-4o’s single-pass long-context inference. We establish, for the first time, a theoretically grounded noise taxonomy for long-context failure, providing a generalizable pathway for co-optimizing chunking and aggregation mechanisms.
📝 Abstract
We investigate the challenge of applying Large Language Models (LLMs) to long texts. We propose a theoretical framework that distinguishes the failure modes of long context tasks into three categories: cross-chunk dependence (task noise), confusion that grows with context size (model noise), and the imperfect integration of partial results (aggregator noise). Under this view, we analyze when it is effective to use multi-agent chunking, i.e., dividing a length sequence into smaller chunks and aggregating the processed results of each chunk. Our experiments on tasks such as retrieval, question answering, and summarization confirm both the theoretical analysis and the conditions that favor multi-agent chunking. By exploring superlinear model noise growth with input length, we also explain why, for large inputs, a weaker model configured with chunk-based processing can surpass a more advanced model like GPT4o applied in a single shot. Overall, we present a principled understanding framework and our results highlight a direct pathway to handling long contexts in LLMs with carefully managed chunking and aggregator strategies.