A structure-aware framework for learning device placements on computation graphs

📅 2024-05-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

236K/year

🤖 AI Summary

This paper addresses the device placement problem for neural network computational graphs on heterogeneous hardware. Methodologically, it proposes a structure-aware end-to-end optimization framework that unifies the grouper-placer and encoder-placer paradigms for the first time; introduces a variable-granularity, personalized graph partitioning mechanism inspired by graph parsing networks; and jointly performs graph coarsening, DAG-topology-aware node embedding learning, and reinforcement learning–based policy optimization—where execution time serves as the reward signal. The key contributions lie in explicit modeling of directed acyclic graph (DAG) structure and support for dynamic grouping and joint representation learning. Experiments on Inception-V3, ResNet, and BERT demonstrate up to 58.2% inference speedup over CPU-only execution and 60.24% improvement over state-of-the-art baselines. Ablation studies confirm the effectiveness of each component and the overall robustness of the framework.

Technology Category

Application Category

📝 Abstract

Computation graphs are Directed Acyclic Graphs (DAGs) where the nodes correspond to mathematical operations and are used widely as abstractions in optimizations of neural networks. The device placement problem aims to identify optimal allocations of those nodes to a set of (potentially heterogeneous) devices. Existing approaches rely on two types of architectures known as grouper-placer and encoder-placer, respectively. In this work, we bridge the gap between encoder-placer and grouper-placer techniques and propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into account the DAG nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and jointed, personalized graph partitioning, using an unspecified number of groups. To train the entire framework, we use reinforcement learning using the execution time of the placement as a reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to 58.2% over CPU execution and by up to 60.24% compared to other commonly used baselines.

Problem

Research questions and friction points this paper is trying to address.

Device Allocation

Neural Networks

Heterogeneous Computing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Optimization

Device Allocation

End-to-End Training

🔎 Similar Papers

No similar papers found.