🤖 AI Summary
Diffusion models suffer from high computational overhead and low energy efficiency due to algorithmic complexity, hindering their deployment in generative AI. To address this, we propose a spintronic in-memory stochastic computing hardware leveraging voltage-controlled magnetoelectric effects. This architecture natively implements true diffusion dynamics conforming to Langevin kinetics at the physical level—bypassing the von Neumann bottleneck for the first time. Integrating non-volatile magnetic memory with in-memory computation, it enables end-to-end brain-inspired diffusion sampling. Experimental evaluation demonstrates image generation quality (FID) on par with software-based implementations, while achieving ~10³× higher energy efficiency (in J/bit/μm²) compared to state-of-the-art GPU/TPU accelerators. Our work establishes a scalable, ultra-low-power hardware paradigm for generative AI.
📝 Abstract
Stochastic diffusion processes are pervasive in nature, from the seemingly erratic Brownian motion to the complex interactions of synaptically-coupled spiking neurons. Recently, drawing inspiration from Langevin dynamics, neuromorphic diffusion models were proposed and have become one of the major breakthroughs in the field of generative artificial intelligence. Unlike discriminative models that have been well developed to tackle classification or regression tasks, diffusion models as well as other generative models such as ChatGPT aim at creating content based upon contexts learned. However, the more complex algorithms of these models result in high computational costs using today's technologies, creating a bottleneck in their efficiency, and impeding further development. Here, we develop a spintronic voltage-controlled magnetoelectric memory hardware for the neuromorphic diffusion process. The in-memory computing capability of our spintronic devices goes beyond current Von Neumann architecture, where memory and computing units are separated. Together with the non-volatility of magnetic memory, we can achieve high-speed and low-cost computing, which is desirable for the increasing scale of generative models in the current era. We experimentally demonstrate that the hardware-based true random diffusion process can be implemented for image generation and achieve comparable image quality to software-based training as measured by the Frechet inception distance (FID) score, achieving ~10^3 better energy-per-bit-per-area over traditional hardware.