Neural Network Diffusion

📅 2024-02-20

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 2

career value

155K/year

🤖 AI Summary

To address the limitations of conventional neural network parameter generation—namely, reliance on traditional supervised training, poor generalization, and limited diversity—this paper proposes the first end-to-end diffusion-based framework for direct neural network parameter synthesis. Methodologically, it jointly trains a variational autoencoder (VAE) and a denoising diffusion probabilistic model (DDPM) to enable direct sampling of high-performance network weights from noise in the learned parameter latent space, followed by decoder-based reconstruction into fully deployable models. Key contributions include: (i) the first application of diffusion models to neural network parameter generation, eliminating memory-based imitation of pretrained models and enabling strong generalization and architecture-agnostic synthesis; and (ii) demonstrated compatibility with diverse architectures (e.g., CNNs, ViTs) and benchmarks (e.g., CIFAR, ImageNet). Experiments show that generated models achieve accuracy comparable to or exceeding their conventionally trained counterparts, with negligible inference overhead—requiring only a single diffusion sampling step for complete parameter generation.

Technology Category

Application Category

📝 Abstract

Diffusion models have achieved remarkable success in image and video generation. In this work, we demonstrate that diffusion models can also extit{generate high-performing neural network parameters}. Our approach is simple, utilizing an autoencoder and a diffusion model. The autoencoder extracts latent representations of a subset of the trained neural network parameters. Next, a diffusion model is trained to synthesize these latent representations from random noise. This model then generates new representations, which are passed through the autoencoder's decoder to produce new subsets of high-performing network parameters. Across various architectures and datasets, our approach consistently generates models with comparable or improved performance over trained networks, with minimal additional cost. Notably, we empirically find that the generated models are not memorizing the trained ones. Our results encourage more exploration into the versatile use of diffusion models. Our code is available href{https://github.com/NUS-HPC-AI-Lab/Neural-Network-Diffusion}{here}.

Problem

Research questions and friction points this paper is trying to address.

Neural Network Generation

Diffusion Model

Autoencoder

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Models

Autoencoders

Parameter Generation

🔎 Similar Papers

No similar papers found.