🤖 AI Summary
This paper addresses the static mapping problem of large-scale directed acyclic graph (DAG) tasks onto CPU/GPU/FPGA/AI heterogeneous systems—particularly challenging due to FPGA-specific streaming execution semantics and high architectural heterogeneity. We propose a general-purpose mapping paradigm grounded in graph decomposition and analytical model evaluation. Key contributions include: (i) the first forest-based series-parallel decomposition tree construction algorithm for arbitrary DAGs; and (ii) a DAG-forest modeling framework integrating heterogeneous resource characterization, streaming-aware scheduling constraints, and accurate makespan estimation. Experimental results demonstrate that, under complex heterogeneous settings, our approach significantly reduces makespan compared to HEFT variants, while achieving orders-of-magnitude speedup over genetic algorithms and MILP-based mappers—thereby attaining an optimal trade-off between solution quality and computational efficiency.
📝 Abstract
Modern heterogeneous systems consist of many different processing units, such as CPUs, GPUs, FPGAs and AI units. A central problem in the design of applications in this environment is to find a beneficial mapping of tasks to processing units. While there are various approaches to task mapping, few can deal with high heterogeneity or applications with a high number of tasks and many dependencies. In addition, streaming aspects of FPGAs are generally not considered. We present a new general task mapping principle based on graph decompositions and model-based evaluation that can find beneficial mappings regardless of the complexity of the scenario. We apply this principle to create a high-quality and reasonably efficient task mapping algorithm using series-parallel decompositions. For this, we present a new algorithm to compute a forest of series-parallel decomposition trees for general DAGs. We compare our decomposition-based mapping algorithm with three mixed-integer linear programs, one genetic algorithm and two variations of the Heterogeneous Earliest Finish Time (HEFT) algorithm. We show that our approach can generate mappings that lead to substantially higher makespan improvements than the HEFT variations in complex environments while being orders of magnitude faster than a mapper based on genetic algorithms or integer linear programs.