A High-Level Compiler Integration Approach for Deep Learning Accelerators Supporting Abstraction and Optimization

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

Integrating custom hardware accelerators—particularly GEMM-based ones—into mainstream ML compilers remains challenging due to tight coupling between accelerator-specific optimizations and compiler internals. Method: This paper proposes a high-level, low-intrusion integration methodology for TVM that abstracts hardware scheduling interfaces to decouple accelerator characteristics from compiler implementation. Leveraging the CoSA design-space exploration framework, it automates hardware-aware scheduling optimizations—including tensor tiling, non-uniform mapping, and double buffering—without modifying TVM’s core infrastructure. Contribution/Results: Evaluated on the Gemmini accelerator, the approach achieves performance on par with hand-optimized toolchains while significantly improving developer productivity and cross-model/cross-architecture portability. The abstraction enables seamless reuse of scheduling policies across diverse accelerator microarchitectures and neural network workloads, reducing integration effort from weeks to hours.

Technology Category

Application Category

📝 Abstract

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML) compilers remains a complex challenge due to the need for significant modifications in compilation layers and specialized scheduling techniques. Existing frameworks offer partial solutions and require users to navigate intricate compiler internals. In this paper, we introduce a TVM-based compilation integration approach that targets GEMM-based deep learning accelerators. Our approach abstracts the complexities of compiler integration, enabling seamless integration of accelerators without requiring in-depth knowledge of the underlying compiler. Furthermore, we extend and incorporate design space exploration tools, specifically CoSA, to automate efficient tensor scheduling, accounting for factors such as uneven mapping and double buffering. Our framework is benchmarked on the Gemmini accelerator, demonstrating performance comparable to its specialized manually implemented toolchain.

Problem

Research questions and friction points this paper is trying to address.

Integrating custom accelerators into ML compilers is complex

Existing frameworks require deep compiler knowledge and offer partial solutions

Automating efficient tensor scheduling for deep learning accelerators is challenging

Innovation

Methods, ideas, or system contributions that make the work stand out.

TVM-based compilation integration for GEMM accelerators

Abstracts compiler complexities for seamless integration

Automates tensor scheduling with CoSA exploration tools

🔎 Similar Papers

No similar papers found.