Reinforced Fast Weights with Next-Sequence Prediction

📅 2026-02-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing fast weight architectures are constrained by token-level autoregressive training paradigms, which struggle to capture long-range semantic coherence. This work proposes the REFINE framework, which introduces sequence-level reinforcement learning into fast weight training for the first time. REFINE dynamically identifies high-entropy positions to generate multi-token rollouts and integrates self-supervised rewards with Grouped Relative Policy Optimization (GRPO) to jointly optimize model behavior during mid-training, late-training, and inference stages. By transcending the limitations of single-step prediction, the method significantly outperforms conventional fine-tuning strategies on both LaCT-760M and DeltaNet-1.3B, achieving consistent performance gains across multiple long-context benchmarks in LongBench.

Technology Category

Application Category

📝 Abstract

Fast weight architectures offer a promising alternative to attention-based transformers for long-context modeling by maintaining constant memory overhead regardless of context length. However, their potential is limited by the next-token prediction (NTP) training paradigm. NTP optimizes single-token predictions and ignores semantic coherence across multiple tokens following a prefix. Consequently, fast weight models, which dynamically update their parameters to store contextual information, learn suboptimal representations that fail to capture long-range dependencies. We introduce REFINE (Reinforced Fast weIghts with Next sEquence prediction), a reinforcement learning framework that trains fast weight models under the next-sequence prediction (NSP) objective. REFINE selects informative token positions based on prediction entropy, generates multi-token rollouts, assigns self-supervised sequence-level rewards, and optimizes the model with group relative policy optimization (GRPO). REFINE is applicable throughout the training lifecycle of pre-trained language models: mid-training, post-training, and test-time training. Our experiments on LaCT-760M and DeltaNet-1.3B demonstrate that REFINE consistently outperforms supervised fine-tuning with NTP across needle-in-a-haystack retrieval, long-context question answering, and diverse tasks in LongBench. REFINE provides an effective and versatile framework for improving long-context modeling in fast weight architectures.

Problem

Research questions and friction points this paper is trying to address.

fast weight architectures

next-token prediction

long-context modeling

semantic coherence

long-range dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast Weights

Next-Sequence Prediction

Reinforcement Learning