An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

To address substantial data movement overhead, low hardware-software co-design efficiency, and difficulties in customized deployment on heterogeneous accelerator clusters, this paper proposes SNAX—an open-source hardware-software co-development framework. Its core innovation is a hybrid coupling mechanism that integrates loosely coupled asynchronous control with tightly coupled shared-memory data access, complemented by reusable accelerator interconnect IP cores, an MLIR-based customizable compiler, and an asynchronous control bus. This design significantly improves integration efficiency and cross-platform compatibility while preserving system usability. Evaluated on a low-power heterogeneous SoC, SNAX achieves over 10× higher neural network task throughput compared to state-of-the-art systems, with sustained average accelerator utilization exceeding 90%.

Technology Category

Application Category

📝 Abstract

Heterogeneous accelerator-centric compute clusters are emerging as efficient solutions for diverse AI workloads. However, current integration strategies often compromise data movement efficiency and encounter compatibility issues in hardware and software. This prevents a unified approach that balances performance and ease of use. To this end, we present SNAX, an open-source integrated HW-SW framework enabling efficient multi-accelerator platforms through a novel hybrid-coupling scheme, consisting of loosely coupled asynchronous control and tightly coupled data access. SNAX brings reusable hardware modules designed to enhance compute accelerator utilization, and its customizable MLIR-based compiler to automate key system management tasks, jointly enabling rapid development and deployment of customized multi-accelerator compute clusters. Through extensive experimentation, we demonstrate SNAX's efficiency and flexibility in a low-power heterogeneous SoC. Accelerators can easily be integrated and programmed to achieve > 10x improvement in neural network performance compared to other accelerator systems while maintaining accelerator utilization of > 90% in full system operation.

Problem

Research questions and friction points this paper is trying to address.

Addressing data movement inefficiency in multi-accelerator AI systems

Resolving hardware-software compatibility issues in heterogeneous clusters

Enabling unified performance-usable development for accelerator platforms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source HW-SW co-development framework

Hybrid-coupling with asynchronous control

MLIR-based compiler automating system management

🔎 Similar Papers

Dynamic Co-Optimization Compiler: Leveraging Multi-Agent Reinforcement Learning for Enhanced DNN Accelerator Performance