FADiff: Fusion-Aware Differentiable Optimization for DNN Scheduling on Tensor Accelerators

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Co-optimizing intra-layer mapping and inter-layer fusion for DNNs—especially LLMs—on tensor accelerators remains challenging due to the discrete, hardware-constrained design space. Method: This paper proposes the first fusion-aware, differentiable end-to-end optimization framework. It introduces a unified, differentiable analytical cost model that encodes hardware constraints—including memory bandwidth and compute unit limitations—as continuous, differentiable loss terms, enabling efficient gradient-based search over discrete mapping and fusion decisions. Contribution/Results: The framework achieves joint optimization of mapping strategies and fusion schedules while preserving model accuracy. Experiments on real tensor accelerators demonstrate an average 2.1× improvement in energy efficiency and 1.8× reduction in inference latency over state-of-the-art methods, validating its generalizability and practicality.

Technology Category

Application Category

📝 Abstract
Efficient deployment of Deep Neural Networks (DNNs), such as Large Language Models (LLMs), on tensor accelerators is essential for maximizing computational efficiency in modern AI systems. However, achieving this is challenging due to the enormous and complex design space created by the interaction of intra-layer mapping and inter-layer fusion. In this work, we present FADiff, a gradient-based optimization framework capable of automatically identifying high-quality intra-layer mapping and inter-layer fusion strategies to accelerate inference for DNN workloads. We first construct a unified and differentiable analytical cost model, which accurately predicts the energy and latency of both single-layer mappings and various layer fusion strategies. Then, by encoding discrete constraints into the loss function, we employ a gradient-based approach to efficiently explore the vast design space, determining the optimal joint strategy for mapping and fusion. Experimental results demonstrate the superiority of FADiff, achieving better optimization in terms of energy and latency compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Optimizes DNN scheduling on tensor accelerators
Automates intra-layer mapping and inter-layer fusion strategies
Enhances inference efficiency via gradient-based optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable optimization framework for DNN scheduling
Unified cost model for mapping and fusion strategies
Gradient-based search for joint mapping and fusion optimization
🔎 Similar Papers
No similar papers found.
S
Shuao Jia
Beijing University of Posts and Telecommunications
Z
Zichao Ling
Beijing University of Posts and Telecommunications
C
Chen Bai
The Hong Kong University of Science and Technology
K
Kang Zhao
Beijing University of Posts and Telecommunications
Jianwang Zhai
Jianwang Zhai
Beijing University of Posts and Telecommunications
Machine Learning for EDAPower ModelingDesign Space ExplorationPhysical Design