The Turbo-Charged Mapper: Fast and Optimal Mapping for Accelerator Modeling and Evaluation

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently finding optimal mapping strategies for deep neural network accelerators, where mappings critically impact energy efficiency and latency yet existing approaches fail to converge to optimality within practical timeframes. The paper proposes Turbo-Charged Mapper (TCM), which introduces the novel concept of “data placement” to expose structural redundancies in the mapping space. By integrating formal analysis with aggressive pruning techniques, TCM achieves a search space reduction of up to 10³²-fold. This enables exhaustive traversal of the pruned space within one minute while rigorously guaranteeing optimality. In contrast, state-of-the-art methods—even when allowed over ten hours of runtime—fail to converge and yield suboptimal mappings with 21% higher energy-delay product (EDP).

Technology Category

Application Category

📝 Abstract
The energy and latency of an accelerator running a deep neural network (DNN) depend on how the computation and data movement are scheduled in the accelerator (i.e., mapping). Optimizing mappings is essential to evaluating and designing accelerators. However, the space of mappings is large, and prior works can not guarantee finding optimal mappings because they use heuristics or metaheuristics to narrow down the space. These limitations preclude proper hardware evaluation, since designers can not tell whether performance differences are due to changes in hardware or suboptimal mapping. To address this challenge, we propose the Turbo-Charged Mapper (TCM), a fast mapper that is guaranteed to find optimal mappings. The key to our approach is that we define a new concept in mapping, called dataplacement, which, like the prior concept of dataflow, allows for clear analysis and comparison of mappings. Through it, we identify multiple opportunities to prune redundant and suboptimal mappings, reducing search space by up to 32 orders of magnitude. Leveraging these insights, TCM can perform full mapspace searches, making it the first mapper that can find optimal mappings in feasible runtime. Compared to prior mappers, we show that TCM can find optimal mappings quickly (less than a minute), while prior works can not find optimal mappings (energy-delay-product $21\%$ higher than optimal) even when given $1000\times$ the runtime ($>10$ hours).
Problem

Research questions and friction points this paper is trying to address.

accelerator mapping
optimal mapping
DNN accelerator
mapspace search
hardware evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

optimal mapping
datapacement
mapspace pruning
accelerator evaluation
energy-delay-product
🔎 Similar Papers
No similar papers found.
M
Michael Gilbert
MIT
T
Tanner Andrulis
MIT
Vivienne Sze
Vivienne Sze
Professor, EECS at MIT
VLSILow-Power DesignMachine LearningRoboticsVideo Coding
J
Joel S. Emer
MIT / Nvidia