GMem: A Modular Approach for Ultra-Efficient Generative Models

📅 2024-12-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from parameter redundancy and low training/inference efficiency due to implicit semantic memory within network weights. To address this, we propose a memory-model decoupling paradigm: core semantic knowledge is explicitly stored in an external, immutable memory set, thereby eliminating the model’s reliance on internal memory capacity. We employ a lightweight diffusion backbone—compatible with SiT and LightningDiT—and introduce a modular memory mechanism alongside classifier-free guidance (CFG)-free sampling. This approach breaks the traditional trade-off among generation quality, diversity, and efficiency. On ImageNet 256×256, our method achieves FID=7.66 in just 28 epochs (~4 hours) and reaches FID=1.53 in 160 epochs (~20 hours) without CFG—substantially outperforming LightningDiT (FID=2.17 after 800 epochs) and delivering up to 50× training speedup.

Technology Category

Application Category

📝 Abstract
Recent studies indicate that the denoising process in deep generative diffusion models implicitly learns and memorizes semantic information from the data distribution. These findings suggest that capturing more complex data distributions requires larger neural networks, leading to a substantial increase in computational demands, which in turn become the primary bottleneck in both training and inference of diffusion models. To this end, we introduce GMem: A Modular Approach for Ultra-Efficient Generative Models. Our approach GMem decouples the memory capacity from model and implements it as a separate, immutable memory set that preserves the essential semantic information in the data. The results are significant: GMem enhances both training, sampling efficiency, and diversity generation. This design on one hand reduces the reliance on network for memorize complex data distribution and thus enhancing both training and sampling efficiency. On ImageNet at $256 imes 256$ resolution, GMem achieves a $50 imes$ training speedup compared to SiT, reaching FID $=7.66$ in fewer than $28$ epochs ($sim 4$ hours training time), while SiT requires $1400$ epochs. Without classifier-free guidance, GMem achieves state-of-the-art (SoTA) performance FID $=1.53$ in $160$ epochs with only $sim 20$ hours of training, outperforming LightningDiT which requires $800$ epochs and $sim 95$ hours to attain FID $=2.17$.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational demands in diffusion models.
Enhance training and sampling efficiency.
Decouple memory capacity from model architecture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular memory decoupling
Separate immutable memory set
Enhanced training and sampling efficiency
🔎 Similar Papers
No similar papers found.
Y
Yi Tang
Westlake University
P
Peng Sun
Westlake University, Zhejiang University
Zhenglin Cheng
Zhenglin Cheng
Zhejiang University & Westlake University, SII
Multimodal LearningDiffusion Models
T
Tao Lin
Westlake University