Long-context Language Models Cannot Retrieve Without Sufficient Steps

📅 2024-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study identifies fundamental capability boundaries of long-context language models (LCLMs) on multi-match and logical retrieval tasks, demonstrating that merely expanding context window size is insufficient. Method: We systematically construct a diverse retrieval benchmark, design a controllable-step reasoning mechanism, and establish a standardized evaluation framework to isolate the effects of reasoning steps and chain-of-thought (CoT) prompting. Contribution/Results: We formally characterize the “reasoning-step criticality” phenomenon—the first systematic identification of LCLMs’ retrieval capability thresholds—challenging the prevailing assumption that longer contexts inherently yield better performance. Empirical results show failure rates exceeding 90% under standard settings; with step-adapted CoT prompts, accuracy improves to over 95%, albeit at significantly increased computational cost. This work underscores the necessity of sufficient, controllable reasoning steps and task-specific CoT design—not just extended context—for effective retrieval in LCLMs.

Technology Category

Application Category

📝 Abstract
Long-context language models (LCLMs), characterized by their extensive context window, are becoming popular. However, despite they are nearly perfect at standard long-context retrieval tasks, we find they are not good at all types of retrieval tasks. Specifically, we identify 2 basic cases,"multi-matching retrieval,"and"logic-based retrieval", which are beyond LCLMs' ability boundary under normal settings. Later, we find these cases can be well addressed with a specific number of reasoning steps, guided by specific CoT prompts, but it may cost too much time. Thus we propose a critical viewpoint that there are currently no perfect solutions for current LCLMs to solve all types of retrieval tasks. Our work reveals some novel properties of retrieval tasks and LCLMs, proving that long-context handling still has a long way to go.
Problem

Research questions and friction points this paper is trying to address.

Long Context Language Models
Multi-hop Search Tasks
Logical Search
Innovation

Methods, ideas, or system contributions that make the work stand out.

Long Context Language Models
Incremental Prompting
Complex Search Tasks
🔎 Similar Papers
No similar papers found.
Yijiong Yu
Yijiong Yu
Master Student, Tsinghua University
Natural Language ProcessingMachine Learning
M
Ma Xiufa
Chinaunicom Software
J
Jianwei Fang
Chinaunicom Software
Z
Zhi Xu
Fudan University
G
Guangyao Su
Chinaunicom Software
J
Jiancheng Wang
Chinaunicom Software
Y
Yo-Ping Huang
Tsinghua University
Z
Zhixiao Qi
Tsinghua University
W
Wei Wang
OpenCSG
Weifeng Liu
Weifeng Liu
University of Florida
Machine LearningSignal ProcessingKernel adaptive filtering
R
Ran Chen
OpenCSG
J
Ji Pei
OpenCSG