Scope: A Scalable Merged Pipeline Framework for Multi-Chip-Module NN Accelerators

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of multi-chip module (MCM) neural network accelerators, which often suffer from low computational utilization and high off-chip communication overhead due to the inability of conventional parallelization strategies to simultaneously achieve scalability and efficiency. To overcome these challenges, the authors propose a unified multi-layer pipelined framework that introduces, for the first time, a cross-layer joint scheduling mechanism, transcending the constraints of intra-layer or inter-layer pipelining and enabling superior co-optimization across computation, communication, and memory. An efficient search algorithm is developed to reduce the design space exploration complexity from exponential to linear. Experimental results on ResNet-152 inference demonstrate that the proposed approach achieves up to 1.73× higher throughput compared to the state-of-the-art, while maintaining comparable energy consumption.

Technology Category

Application Category

📝 Abstract
Neural network (NN) accelerators with multi-chip-module (MCM) architectures enable integration of massive computation capability; however, they face challenges of computing resource underutilization and off-chip communication overheads. Traditional parallelization schemes for NN inference on MCM architectures, such as intra-layer parallelism and inter-layer pipelining, show incompetency in breaking through both challenges, limiting the scalability of MCM architectures. We observed that existing works typically deploy layers separately rather than considering them jointly. This underexploited dimension leads to compromises between system computation and communication, thus hindering optimal utilization, especially as hardware/software scale. To address this limitation, we propose Scope, a merged pipeline framework incorporating this overlooked multi-layer dimension, thereby achieving improved throughput and scalability by relaxing tradeoffs between computation, communication and memory costs. This new dimension, however, adds to the complexity of design space exploration (DSE). To tackle this, we develop a series of search algorithms that achieves exponential-to-linear complexity reduction, while identifying solutions that rank in the top 0.05% of performance. Experiments show that Scope achieves up to 1.73x throughput improvement while maintaining similar energy consumption for ResNet-152 inference compared to state-of-the-art approaches.
Problem

Research questions and friction points this paper is trying to address.

multi-chip-module
neural network accelerator
resource underutilization
communication overhead
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-chip-module
merged pipeline
multi-layer scheduling
design space exploration
NN accelerator
🔎 Similar Papers
No similar papers found.
Z
Zongle Huang
Tsinghua University, Beijing National Research Center for Information Science and Technology
H
Hongyang Jia
Tsinghua University, Beijing National Research Center for Information Science and Technology
K
Kaiwei Zou
Capital Normal University
Yongpan Liu
Yongpan Liu
Professor @ Tsinghua University
Machine LearningNonvolatile Memory and ComputingEnergy Efficient VLSIEmbedded SystemDesign Methodology