InfiniteICL: Breaking the Limit of Context Window Size via Long Short-term Memory Transformation

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the physical context window limitations of large language models (LLMs) and their inefficiency in leveraging ultra-long contexts, this paper proposes a cognitive-inspired memory transformation paradigm. It analogizes input context to human short-term memory and introduces a three-stage mechanism—knowledge distillation, salient information selection, and parameter-level knowledge consolidation—to convert contextual information into persistent, parameter-level updates, thereby enabling cross-modal mapping from context-to-parameters and short-term-to-long-term memory. The method requires no fine-tuning or additional parameters and theoretically supports infinite-context integration. Experiments demonstrate a 90% context compression rate; on factual recall, reasoning, and skill acquisition tasks, it achieves an average performance of 103% relative to full-context prompting. When processing real-world documents up to 2 million tokens, it surpasses full-context prompting using only 0.4% of the original context.

Technology Category

Application Category

📝 Abstract

In-context learning (ICL) is critical for large language models (LLMs), but its effectiveness is constrained by finite context windows, particularly in ultra-long contexts. To overcome this, we introduce InfiniteICL, a framework that parallels context and parameters in LLMs with short- and long-term memory in human cognitive systems, focusing on transforming temporary context knowledge into permanent parameter updates. This approach significantly reduces memory usage, maintains robust performance across varying input lengths, and theoretically enables infinite context integration through the principles of context knowledge elicitation, selection, and consolidation. Evaluations demonstrate that our method reduces context length by 90% while achieving 103% average performance of full-context prompting across fact recall, grounded reasoning, and skill acquisition tasks. When conducting sequential multi-turn transformations on complex, real-world contexts (with length up to 2M tokens), our approach surpasses full-context prompting while using only 0.4% of the original contexts. These findings highlight InfiniteICL's potential to enhance the scalability and efficiency of LLMs by breaking the limitations of conventional context window sizes.

Problem

Research questions and friction points this paper is trying to address.

Overcoming finite context window limits in LLMs

Transforming temporary context into permanent parameter updates

Enabling infinite context integration with reduced memory usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms context into parameter updates

Reduces memory usage significantly

Enables infinite context integration theoretically

🔎 Similar Papers

ReAttention: Training-Free Infinite Context with Finite Attention Scope