Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

Explicit memory architectures in language modeling often struggle to scale due to gradient instability. This work proposes Phase Memory Networks (PMNet), which structurally ensure stable memory dynamics through a phase-rotation update mechanism on the complex unit circle and hierarchical learnable anchor points, maintaining constant gradient norms without requiring special initialization. The approach enables large-scale explicit memory expansion, achieving near-perfect (≈100%) exact retrieval accuracy on byte-level copy-paste tasks—significantly surpassing the effective receptive field of sliding-window attention. Remarkably, with only 119 million parameters, PMNet matches the zero-shot long-context robustness of a Mamba model three times its size.

📝 Abstract

For over a decade, explicit memory architectures like the Neural Turing Machine have remained theoretically appealing yet practically intractable for language modeling due to catastrophic gradient instability during Backpropagation Through Time. In this work, we break this stalemate with \textit{Phasor Memory Network} (PMNet), a novel architecture that structurally resolves memory volatility through \textit{Unitary Phasor Dynamics} and \textit{Hierarchical Learnable Anchors}. Rather than relying on brute-force scaling, we present a mechanistic proof-of-concept in a controlled byte-level setting. By constraining recurrent state updates to phase rotations on a complex unit circle, PMNet preserves gradient norms and inherently prevents divergence without the need for specialized initialization. We empirically demonstrate the active actuation of the memory module through a synthetic Copy-Paste task, where PMNet utilizes an expansive \textit{85-slot hierarchical memory tree} ($=\sum^{4}_{h=1}4^{h-1}$) to achieve near 100\% exact retrieval across temporal distances that completely exceed the local sliding window attention's receptive field. Furthermore, despite being a compact 119M parameter model trained on 18.8B tokens, PMNet matches the zero-shot long-context robustness of a Mamba model that is three times larger. Our ablation studies and gradient analyses confirm that the historical failure of explicit memory was a structural alignment problem, which PMNet effectively overcomes, providing a theoretically grounded foundation for scalable sequence modeling.

Problem

Research questions and friction points this paper is trying to address.

explicit memory

gradient instability

Backpropagation Through Time

scalable sequence modeling

memory volatility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Phasor Memory Network

Unitary Phasor Dynamics

Explicit Memory