PERK: Long-Context Reasoning as Parameter-Efficient Test-Time Learning

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Long-context reasoning faces dual challenges: severe noise sensitivity and high memory overhead during test-time learning. To address these, we propose a parameter-efficient test-time learning framework that encodes contextual information into lightweight adapter parameters—bypassing conventional prompting or full-parameter fine-tuning. Methodologically, our approach integrates Low-Rank Adaptation (LoRA) with a bi-level meta-optimization scheme: an inner loop rapidly encodes context into adapters, while an outer loop jointly optimizes adapter parameters for robust long-context modeling. Evaluated on multiple long-context benchmarks, our method achieves absolute accuracy gains of up to 90% for small models and 27% for large models—substantially outperforming prompt engineering baselines while reducing inference memory consumption. Our core contribution is the first explicit mapping of context to learnable adapter parameters, enabling efficient, noise-resilient, and length-scalable long-context reasoning without compromising computational efficiency.

Technology Category

Application Category

📝 Abstract

Long-context reasoning requires accurately identifying relevant information in extensive, noisy input contexts. Previous research shows that using test-time learning to encode context directly into model parameters can effectively enable reasoning over noisy information. However, meta-learning methods for enabling test-time learning are prohibitively memory-intensive, preventing their application to long context settings. In this work, we propose PERK (Parameter Efficient Reasoning over Knowledge), a scalable approach for learning to encode long input contexts using gradient updates to a lightweight model adapter at test time. Specifically, PERK employs two nested optimization loops in a meta-training phase. The inner loop rapidly encodes contexts into a low-rank adapter (LoRA) that serves as a parameter-efficient memory module for the base model. Concurrently, the outer loop learns to use the updated adapter to accurately recall and reason over relevant information from the encoded long context. Our evaluations on several long-context reasoning tasks show that PERK significantly outperforms the standard prompt-based long-context baseline, achieving average absolute performance gains of up to 90% for smaller models (GPT-2) and up to 27% for our largest evaluated model, Qwen-2.5-0.5B. In general, PERK is more robust to reasoning complexity, length extrapolation, and the locations of relevant information in contexts. Finally, we show that while PERK is memory-intensive during training, it scales more efficiently at inference time than prompt-based long-context inference.

Problem

Research questions and friction points this paper is trying to address.

Efficiently encoding long noisy contexts for reasoning

Reducing memory-intensive meta-learning for test-time learning

Improving recall and reasoning accuracy in long contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses gradient updates for lightweight adapter

Employs nested loops for meta-training

Encodes contexts into low-rank adapter

🔎 Similar Papers

Disentangling Latent Shifts of In-Context Learning Through Self-Training