Applying Graph Explanation to Operator Fusion

📅 2024-12-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Operator fusion in DNN inference often suffers from suboptimal grouping, leading to redundant DRAM accesses and diminished memory efficiency. Method: This paper pioneers the integration of Graph Explanation Techniques (GET) from explainable AI into fusion optimization, proposing an attribution-driven recursive fusion-group partitioning method. GET identifies bottleneck operators that impede effective fusion; a greedy tree-based partitioning strategy then dynamically decomposes the fusion graph into high-efficiency subgroups. The method synergistically employs Load-Balanced Dataflow (LBDF) and Bandwidth-Regulated Reuse (BRR) fusion patterns to minimize off-chip memory traffic. Contribution/Results: Unlike conventional heuristic search approaches prone to local optima, our method requires no model retraining or architectural modification. Evaluated on EfficientNet-B3, it reduces DRAM accesses by over 20%; on ResNet and MobileNet variants, it significantly improves fusion success rate and end-to-end inference latency.

Technology Category

Application Category

📝 Abstract
Layer fusion techniques are critical to improving the inference efficiency of deep neural networks (DNN) for deployment. Fusion aims to lower inference costs by reducing data transactions between an accelerator's on-chip buffer and DRAM. This is accomplished by grouped execution of multiple operations like convolution and activations together into single execution units - fusion groups. However, on-chip buffer capacity limits fusion group size and optimizing fusion on whole DNNs requires partitioning into multiple fusion groups. Finding the optimal groups is a complex problem where the presence of invalid solutions hampers traditional search algorithms and demands robust approaches. In this paper we incorporate Explainable AI, specifically Graph Explanation Techniques (GET), into layer fusion. Given an invalid fusion group, we identify the operations most responsible for group invalidity, then use this knowledge to recursively split the original fusion group via a greedy tree-based algorithm to minimize DRAM access. We pair our scheme with common algorithms and optimize DNNs on two types of layer fusion: Line-Buffer Depth First (LBDF) and Branch Requirement Reduction (BRR). Experiments demonstrate the efficacy of our scheme on several popular and classical convolutional neural networks like ResNets and MobileNets. Our scheme achieves over 20% DRAM Access reduction on EfficientNet-B3.
Problem

Research questions and friction points this paper is trying to address.

Depth Neural Network Optimization
Layer Fusion Technique
Memory Access Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Explanation Techniques
Layer Fusion Optimization
Memory Access Reduction
🔎 Similar Papers
No similar papers found.