A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures

📅 2023-11-29

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

276K/year

🤖 AI Summary

Designing deep learning accelerators for heterogeneous HPC and edge platforms faces key challenges including insufficient parallelism exploitation and excessive data movement overhead. This paper systematically surveys accelerator design methodologies, covering hardware-software co-design, high-level synthesis, domain-specific compilers (e.g., TVM, Halide), design space exploration, and cycle-accurate modeling and simulation. We propose, for the first time, a unified multi-dimensional classification framework that distills two fundamental principles: “minimizing data movement” and “maximizing parallelism.” The survey bridges the gap between architectural overviews and implementation-oriented methodologies, explicitly identifying emerging directions such as approximate computing integrated with reconfigurability. Our work provides both a methodological foundation and practical guidance for developing efficient, scalable AI accelerators—enabling principled design decisions across diverse heterogeneous computing ecosystems.

📝 Abstract

Given their increasing size and complexity, the need for efficient execution of deep neural networks has become increasingly pressing in the design of heterogeneous High-Performance Computing (HPC) and edge platforms, leading to a wide variety of proposals for specialized deep learning architectures and hardware accelerators. The design of such architectures and accelerators requires a multidisciplinary approach combining expertise from several areas, from machine learning to computer architecture, low-level hardware design, and approximate computing. Several methodologies and tools have been proposed to improve the process of designing accelerators for deep learning, aimed at maximizing parallelism and minimizing data movement to achieve high performance and energy efficiency. This paper critically reviews influential tools and design methodologies for Deep Learning accelerators, offering a wide perspective in this rapidly evolving field. This work complements surveys on architectures and accelerators by covering hardware-software co-design, automated synthesis, domain-specific compilers, design space exploration, modeling, and simulation, providing insights into technical challenges and open research directions.

Problem

Research questions and friction points this paper is trying to address.

Efficient execution of large deep neural networks on heterogeneous platforms

Multidisciplinary design of specialized architectures and hardware accelerators

Optimizing parallelism and data movement for performance and energy efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-software co-design for deep learning accelerators

Automated synthesis and domain-specific compilers

Design space exploration with modeling and simulation

🔎 Similar Papers

A Survey on Deep Learning Hardware Accelerators for Heterogeneous HPC Platforms