BRIEF-Pro: Universal Context Compression with Short-to-Long Synthesis for Fast and Accurate Multi-Hop Reasoning

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

To address high latency and excessive cognitive load on large language models (LLMs) induced by long-context retrieval-augmented generation (RAG) in multi-hop question answering, this paper proposes a general-purpose, lightweight abstraction-based compression method. Our approach employs a distilled model trained exclusively on short contexts yet capable of processing ultra-long inputs (>10k tokens), enabling the first successful short-to-long context transfer learning. It integrates few-shot distillation with end-to-end multi-hop QA optimization and supports user-controllable summary length. Evaluated on four open-domain multi-hop QA benchmarks, our method achieves a 32× context compression ratio when deployed with a 70B LLM, improves average accuracy by 4.67%, and incurs only 23% of the computational overhead of LongLLMLingua. The framework significantly enhances both inference efficiency and cross-context generalization capability.

Technology Category

Application Category

📝 Abstract

As retrieval-augmented generation (RAG) tackles complex tasks, increasingly expanded contexts offer richer information, but at the cost of higher latency and increased cognitive load on the model. To mitigate this bottleneck, especially for intricate multi-hop questions, we introduce BRIEF-Pro. It is a universal, lightweight compressor that distills relevant evidence for a given query from retrieved documents into a concise summary for seamless integration into in-context RAG. Using seed data consisting of relatively short contexts (fewer than 1k words), BRIEF-Pro is trained to perform abstractive compression of extended contexts exceeding 10k words across a wide range of scenarios. Furthermore, BRIEF-Pro offers flexible user control over summary length by allowing users to specify the desired number of sentences. Experiments on four open-domain multi-hop question-answering datasets show that BRIEF-Pro generates more concise and relevant summaries, enhancing performance across small, large, and proprietary language models. With the 70B reader model, 32x compression by BRIEF-Pro improves QA performance by 4.67% on average over LongLLMLingua's 9x, while requiring only 23% of its computational overhead.

Problem

Research questions and friction points this paper is trying to address.

Compressing long contexts to reduce latency in retrieval-augmented generation

Distilling relevant evidence from documents for multi-hop reasoning tasks

Enhancing question-answering accuracy while minimizing computational overhead

Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal lightweight compressor for context distillation

Abstractive compression from short to long contexts

Flexible user control over summary sentence count

🔎 Similar Papers

GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning