🤖 AI Summary
Existing FFT implementations—including real-to-complex FFT (rFFT)—are not truly in-place: rFFT maps a length-$n$ real input to a complex output of size $n/2+1$, causing dimensional mismatch and non-negligible auxiliary memory overhead. This work proposes rdFFT—the first fully in-place real-domain FFT framework—that jointly leverages implicit complex-number encoding, frequency-domain conjugate symmetry modeling, and in-place butterfly computation to enable input and output to share a single $n$-dimensional real memory buffer, eliminating all intermediate storage. rdFFT achieves the first strictly in-place realization of rFFT, reducing peak training memory by 32%–47%. We validate its effectiveness across multiple NLU tasks using BERT and RoBERTa, demonstrating consistent accuracy preservation. By providing an efficient, memory-optimal primitive, rdFFT establishes foundational support for lightweight, frequency-domain deep learning.
📝 Abstract
Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size n to a complex output of size n/2+1, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output memory space consistency. By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Experiments on multiple natural language understanding tasks demonstrate the method effectiveness in reducing training memory cost, offering a promising direction for frequency-domain lightweight adaptation.