DiffAxE: Diffusion-driven Hardware Accelerator Generation and Design Space Exploration

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

259K/year

🤖 AI Summary

Hardware accelerator design space exploration (DSE) faces challenges of enormous scale (O(10¹⁷)), non-convexity, many-to-one mappings, and non-differentiability. To address these, this work pioneers the application of diffusion models to hardware design generation, proposing a condition-driven 1D image synthesis framework that formulates architecture generation as inverse performance mapping learning. Unlike gradient-based methods, our approach eliminates reliance on initialization sensitivity and differentiability assumptions, enabling large-scale unstructured search. By integrating structured sampling with conditional generative networks, it achieves end-to-end, performance-guided architecture synthesis. Experiments demonstrate a 1312× speedup in search time, a 30% reduction in generation error, a 9.8% improvement in energy-delay product (EDP), and LLM inference energy efficiency 7.75× higher than DOSA.

Technology Category

Application Category

📝 Abstract

Design space exploration (DSE) is critical for developing optimized hardware architectures, especially for AI workloads such as deep neural networks (DNNs) and large language models (LLMs), which require specialized acceleration. As model complexity grows, accelerator design spaces have expanded to O(10^17), becoming highly irregular, non-convex, and exhibiting many-to-one mappings from design configurations to performance metrics. This complexity renders direct inverse derivation infeasible and necessitates heuristic or sampling-based optimization. Conventional methods - including Bayesian optimization, gradient descent, reinforcement learning, and genetic algorithms - depend on iterative sampling, resulting in long runtimes and sensitivity to initialization. Deep learning-based approaches have reframed DSE as classification using recommendation models, but remain limited to small-scale (O(10^3)), less complex design spaces. To overcome these constraints, we propose a generative approach that models hardware design as 1-D image synthesis conditioned on target performance, enabling efficient learning of non-differentiable, non-bijective hardware-performance mappings. Our framework achieves 0.86% lower generation error than Bayesian optimization with a 17000x speedup, and outperforms GANDSE with 30% lower error at only 1.83x slower search. We further extend the method to a structured DSE setting, attaining 9.8% lower energy-delay product (EDP) and 6% higher performance, with up to 145.6x and 1312x faster search compared to existing optimization methods on O(10^17) design spaces. For LLM inference, our method achieves 3.37x and 7.75x lower EDP on a 32nm ASIC and Xilinx Ultrascale+ VPU13 FPGA, respectively, compared to the state-of-the-art DOSA framework.

Problem

Research questions and friction points this paper is trying to address.

Explores massive hardware design spaces for AI accelerators

Overcomes inefficiency in traditional optimization methods for DSE

Generates high-performance hardware designs using diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative 1-D image synthesis for hardware design

Efficient learning of non-differentiable performance mappings

Structured DSE with faster search and lower EDP

🔎 Similar Papers

Chip Placement with Diffusion Models