Long-context Language Models Cannot Retrieve Without Sufficient Steps

📅 2024-10-06

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This study identifies fundamental capability boundaries of long-context language models (LCLMs) on multi-match and logical retrieval tasks, demonstrating that merely expanding context window size is insufficient. Method: We systematically construct a diverse retrieval benchmark, design a controllable-step reasoning mechanism, and establish a standardized evaluation framework to isolate the effects of reasoning steps and chain-of-thought (CoT) prompting. Contribution/Results: We formally characterize the “reasoning-step criticality” phenomenon—the first systematic identification of LCLMs’ retrieval capability thresholds—challenging the prevailing assumption that longer contexts inherently yield better performance. Empirical results show failure rates exceeding 90% under standard settings; with step-adapted CoT prompts, accuracy improves to over 95%, albeit at significantly increased computational cost. This work underscores the necessity of sufficient, controllable reasoning steps and task-specific CoT design—not just extended context—for effective retrieval in LCLMs.

Technology Category

Application Category

📝 Abstract

Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite they are nearly perfect at standard long-context retrieval tasks, we find they are not good at all types of retrieval tasks. Specifically, we identify 2 basic cases,"multi-matching retrieval,"and"logic-based retrieval", which are beyond LCLMs' ability boundary under normal settings. Later, we find these cases can be well addressed with a specific number of reasoning steps, guided by specific CoT prompts, but it may cost too much time. Thus we propose a critical viewpoint that there are currently no perfect solutions for current LCLMs to solve all types of retrieval tasks. Our work reveals some novel properties of retrieval tasks and LCLMs, proving that long-context handling still has a long way to go.

Problem

Research questions and friction points this paper is trying to address.

Long Context Language Models

Multi-hop Search Tasks

Logical Search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Long Context Language Models

Incremental Prompting

Complex Search Tasks

🔎 Similar Papers

How Well Can a Long Sequence Model Model Long Sequences? Comparing Architechtural Inductive Biases on Long-Context Abilities