Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations

📅 2024-06-19
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of large language models (LLMs) in long-context question answering—including degraded performance, context misalignment, and high token overhead—this paper proposes a context-reuse few-shot generation framework. The method processes a long input document once to automatically construct multiple high-quality query-answer pairs; introduces paragraph-localization instructions to explicitly enhance answer attribution; and designs a single-hop demonstration-driven mechanism for generalizing to multi-hop reasoning. It integrates context-aware prompt generation, dynamic few-shot example construction, and instruction injection. Evaluated on multiple long-text QA benchmarks, the approach achieves an average absolute accuracy improvement of 16.0 percentage points, notably resolving challenging cases where answers reside in the middle of lengthy contexts. To our knowledge, this is the first method enabling efficient, interpretable, and low-overhead few-shot adaptation within long documents.

Technology Category

Application Category

📝 Abstract
Despite recent advancements in Large Language Models (LLMs), their performance on tasks involving long contexts remains sub-optimal. In-Context Learning (ICL) with few-shot examples may be an appealing solution to enhance LLM performance in this scenario; However, na""ively adding ICL examples with long context introduces challenges, including substantial token overhead added for each few-shot example and context mismatch between the demonstrations and the target query. In this work, we propose to automatically generate few-shot examples for long context QA tasks by recycling contexts. Specifically, given a long input context (1-3k tokens) and a query, we generate additional query-output pairs from the given context as few-shot examples, while introducing the context only once. This ensures that the demonstrations are leveraging the same context as the target query while only adding a small number of tokens to the prompt. We further enhance each demonstration by instructing the model to explicitly identify the relevant paragraphs before the answer, which improves performance while providing fine-grained attribution to the answer source. We apply our method on multiple LLMs and obtain substantial improvements (+16 absolute points on average across models) on various QA datasets with long context, especially when the answer lies within the middle of the context. Surprisingly, despite introducing only single-hop ICL examples, LLMs also successfully generalize to multi-hop long-context QA using our approach.
Problem

Research questions and friction points this paper is trying to address.

Improving LLM performance on long-context QA tasks
Generating few-shot examples by recycling input contexts
Enhancing answer accuracy with explicit paragraph attribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recycles context to generate few-shot examples
Identifies relevant paragraphs before answering
Improves performance on long-context QA tasks
🔎 Similar Papers
No similar papers found.