Long-Range Tasks Using Short-Context LLMs: Incremental Reasoning With Structured Memories

πŸ“… 2024-12-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing long-context reasoning methods suffer from high computational overhead, strong dependence on large-scale training data, and architectural complexity, hindering the simultaneous achievement of efficiency and performance. This paper introduces PRISM, a novel framework that pioneers an incremental reasoning paradigm grounded in structured memory: it processes long inputs in streaming chunks, constructs typed and hierarchical structured memories, and enables efficient context reuse. PRISM supports zero-shot schema generation and cross-task transfer without expanding the model’s context window or requiring retraining. It maintains robust performance even with minimal 500-token chunks and incurs no additional encoding overhead. Experiments demonstrate that PRISM reduces required context length by over 4Γ— compared to standard long-context models, cuts token cost by 54% relative to other short-context approaches, and significantly outperforms baselines across diverse long-range reasoning tasks.

Technology Category

Application Category

πŸ“ Abstract
Long-range tasks require reasoning over long inputs. Existing solutions either need large compute budgets, training data, access to model weights, or use complex, task-specific approaches. We present PRISM, which alleviates these concerns by processing information as a stream of chunks, maintaining a structured in-context memory specified by a typed hierarchy schema. This approach demonstrates superior performance to baselines on diverse tasks while using at least 4x smaller contexts than long-context models. Moreover, PRISM is token-efficient. By producing short outputs and efficiently leveraging key-value (KV) caches, it achieves up to 54% cost reduction when compared to alternative short-context approaches. The method also scales down to tiny information chunks (e.g., 500 tokens) without increasing the number of tokens encoded or sacrificing quality. Furthermore, we show that it is possible to generate schemas to generalize our approach to new tasks with minimal effort.
Problem

Research questions and friction points this paper is trying to address.

Long-sequence Processing
Efficiency
Data Requirement
Innovation

Methods, ideas, or system contributions that make the work stand out.

PRISM
Key-Value Cache
Long Information Processing
πŸ”Ž Similar Papers
No similar papers found.