Low Rank and Sparse Fourier Structure in Recurrent Networks Trained on Modular Addition

📅 2025-03-28

🏛️ IEEE International Conference on Acoustics, Speech, and Signal Processing

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study investigates the internal mechanisms enabling Recurrent Neural Networks (RNNs) to generalize on modular addition tasks. Method: Leveraging Fourier analysis, spectral decomposition of weight matrices, and frequency ablation experiments, we analyze trained RNNs’ representational structure. Contribution/Results: We discover that trained RNNs implicitly implement a Fourier multiplication circuit: their weight matrices exhibit strong low-rank structure, and individual neurons can be unambiguously mapped to specific Fourier frequencies—yielding a sparse, frequency-domain representation. Crucially, modular addition is executed via dedicated, parallel frequency channels; ablating a single frequency causes negligible performance degradation, whereas joint ablation of multiple frequencies triggers catastrophic failure—demonstrating the necessity of sparse frequency structure for task solving. Furthermore, we establish a trade-off between Fourier sparsity and model robustness. This work introduces a novel paradigm for interpreting RNNs through structured, frequency-based computational principles.

Technology Category

Application Category

📝 Abstract

Modular addition tasks serve as a useful test bed for observing empirical phenomena in deep learning, including the phenomenon of emph{grokking}. Prior work has shown that one-layer transformer architectures learn Fourier Multiplication circuits to solve modular addition tasks. In this paper, we show that Recurrent Neural Networks (RNNs) trained on modular addition tasks also use a Fourier Multiplication strategy. We identify low rank structures in the model weights, and attribute model components to specific Fourier frequencies, resulting in a sparse representation in the Fourier space. We also show empirically that the RNN is robust to removing individual frequencies, while the performance degrades drastically as more frequencies are ablated from the model.

Problem

Research questions and friction points this paper is trying to address.

RNNs use Fourier Multiplication for modular addition

Identify low rank structures in model weights

RNNs are robust to single frequency ablation

Innovation

Methods, ideas, or system contributions that make the work stand out.

RNNs use Fourier Multiplication strategy

Identify low rank structures in weights

Sparse representation in Fourier space

🔎 Similar Papers

Fourier Circuits in Neural Networks and Transformers: A Case Study of Modular Arithmetic with Multiple Inputs