๐ค AI Summary
To address substantial data movement overhead, low hardware-software co-design efficiency, and difficulties in customized deployment on heterogeneous accelerator clusters, this paper proposes SNAXโan open-source hardware-software co-development framework. Its core innovation is a hybrid coupling mechanism that integrates loosely coupled asynchronous control with tightly coupled shared-memory data access, complemented by reusable accelerator interconnect IP cores, an MLIR-based customizable compiler, and an asynchronous control bus. This design significantly improves integration efficiency and cross-platform compatibility while preserving system usability. Evaluated on a low-power heterogeneous SoC, SNAX achieves over 10ร higher neural network task throughput compared to state-of-the-art systems, with sustained average accelerator utilization exceeding 90%.
๐ Abstract
Heterogeneous accelerator-centric compute clusters are emerging as efficient solutions for diverse AI workloads. However, current integration strategies often compromise data movement efficiency and encounter compatibility issues in hardware and software. This prevents a unified approach that balances performance and ease of use. To this end, we present SNAX, an open-source integrated HW-SW framework enabling efficient multi-accelerator platforms through a novel hybrid-coupling scheme, consisting of loosely coupled asynchronous control and tightly coupled data access. SNAX brings reusable hardware modules designed to enhance compute accelerator utilization, and its customizable MLIR-based compiler to automate key system management tasks, jointly enabling rapid development and deployment of customized multi-accelerator compute clusters. Through extensive experimentation, we demonstrate SNAX's efficiency and flexibility in a low-power heterogeneous SoC. Accelerators can easily be integrated and programmed to achieve > 10x improvement in neural network performance compared to other accelerator systems while maintaining accelerator utilization of > 90% in full system operation.