đ¤ AI Summary
Designing deep learning accelerators for heterogeneous HPC and edge platforms faces key challenges including insufficient parallelism exploitation and excessive data movement overhead. This paper systematically surveys accelerator design methodologies, covering hardware-software co-design, high-level synthesis, domain-specific compilers (e.g., TVM, Halide), design space exploration, and cycle-accurate modeling and simulation. We propose, for the first time, a unified multi-dimensional classification framework that distills two fundamental principles: âminimizing data movementâ and âmaximizing parallelism.â The survey bridges the gap between architectural overviews and implementation-oriented methodologies, explicitly identifying emerging directions such as approximate computing integrated with reconfigurability. Our work provides both a methodological foundation and practical guidance for developing efficient, scalable AI acceleratorsâenabling principled design decisions across diverse heterogeneous computing ecosystems.
đ Abstract
Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide variety of proposals for specialized deep learning architectures and hardware accelerators. The design of such architectures and accelerators requires a multidisciplinary approach combining expertise from several areas, from machine learning to computer architecture, low-level hardware design, and approximate computing. Several methodologies and tools have been proposed to improve the process of designing accelerators for deep learning, aimed at maximizing parallelism and minimizing data movement to achieve high performance and energy efficiency. This paper critically reviews influential tools and design methodologies for Deep Learning accelerators, offering a wide perspective in this rapidly evolving field. This work complements surveys on architectures and accelerators by covering hardware-software co-design, automated synthesis, domain-specific compilers, design space exploration, modeling, and simulation, providing insights into technical challenges and open research directions.