Memory-Efficient Training with In-Place FFT Implementation

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing FFT implementations—including real-to-complex FFT (rFFT)—are not truly in-place: rFFT maps a length-$n$ real input to a complex output of size $n/2+1$, causing dimensional mismatch and non-negligible auxiliary memory overhead. This work proposes rdFFT—the first fully in-place real-domain FFT framework—that jointly leverages implicit complex-number encoding, frequency-domain conjugate symmetry modeling, and in-place butterfly computation to enable input and output to share a single $n$-dimensional real memory buffer, eliminating all intermediate storage. rdFFT achieves the first strictly in-place realization of rFFT, reducing peak training memory by 32%–47%. We validate its effectiveness across multiple NLU tasks using BERT and RoBERTa, demonstrating consistent accuracy preservation. By providing an efficient, memory-optimal primitive, rdFFT establishes foundational support for lightweight, frequency-domain deep learning.

Technology Category

Application Category

📝 Abstract
Fast Fourier Transforms (FFT) are widely used to reduce memory and computational costs in deep learning. However, existing implementations, including standard FFT and real FFT (rFFT), cannot achieve true in-place computation. In particular, rFFT maps an input of size n to a complex output of size n/2+1, causing dimensional mismatch and requiring additional memory allocation. We propose the first real-domain, fully in-place FFT framework (rdFFT) that preserves input-output memory space consistency. By leveraging butterfly operation symmetry and conjugate properties in the frequency domain, we design an implicit complex encoding scheme that eliminates intermediate cache usage entirely. Experiments on multiple natural language understanding tasks demonstrate the method effectiveness in reducing training memory cost, offering a promising direction for frequency-domain lightweight adaptation.
Problem

Research questions and friction points this paper is trying to address.

Achieving true in-place computation for FFT operations
Eliminating dimensional mismatch and memory allocation issues
Reducing training memory costs through implicit encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-domain in-place FFT eliminates memory allocation
Implicit complex encoding removes intermediate cache usage
Butterfly symmetry enables input-output memory consistency
🔎 Similar Papers
No similar papers found.