Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address context-length sensitivity in retrieval-augmented multi-document summarization, this paper proposes a novel method for dynamically estimating the optimal retrieval context length. Methodologically, it introduces a panel-driven estimation algorithm leveraging silver references generated by multiple large language models (LLMs), thereby eliminating reliance on static-length benchmarks such as RULER or HELMET. The approach integrates long-context LLMs (e.g., those supporting 128K+ tokens), retrieval-augmented generation (RAG), and explicit modeling of context-length sensitivity to ensure generalizability across diverse model architectures and scales. Experimental results demonstrate substantial improvements in ROUGE scores on multi-document summarization tasks, with consistent gains across both small and large models. Moreover, the method exhibits superior robustness in ultra-long-context scenarios compared to existing context-length estimation techniques.

Technology Category

Application Category

📝 Abstract

Recent advances in long-context reasoning abilities of language models led to interesting applications in large-scale multi-document summarization. However, prior work has shown that these long-context models are not effective at their claimed context windows. To this end, retrieval-augmented systems provide an efficient and effective alternative. However, their performance can be highly sensitive to the choice of retrieval context length. In this work, we present a hybrid method that combines retrieval-augmented systems with long-context windows supported by recent language models. Our method first estimates the optimal retrieval length as a function of the retriever, summarizer, and dataset. On a randomly sampled subset of the dataset, we use a panel of LLMs to generate a pool of silver references. We use these silver references to estimate the optimal context length for a given RAG system configuration. Our results on the multi-document summarization task showcase the effectiveness of our method across model classes and sizes. We compare against length estimates from strong long-context benchmarks such as RULER and HELMET. Our analysis also highlights the effectiveness of our estimation method for very long-context LMs and its generalization to new classes of LMs.

Problem

Research questions and friction points this paper is trying to address.

Determining optimal retrieval context length for hybrid summarization

Combining retrieval-augmented systems with long-context language models

Evaluating performance across different model classes and sizes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid method combining retrieval-augmented and long-context systems

Estimates optimal retrieval length using retriever, summarizer, dataset

Generates silver references to optimize RAG context length

🔎 Similar Papers

No similar papers found.