An Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deformable Attention Transformers (DATs) suffer from irregular memory access patterns due to data-dependent sampling, hindering efficient hardware deployment—especially on resource-constrained edge platforms. Method: This paper proposes a precision–hardware co-optimization framework. Its core innovations include: (1) a neural architecture search (NAS)-driven automatic tiling strategy that eliminates memory conflicts without modifying the model architecture; (2) joint optimization of inference accuracy and hardware cost metrics (e.g., DRAM accesses); and (3) end-to-end validation on a Xilinx FPGA platform. Results: On ImageNet-1K, the method incurs only a 0.2% Top-1 accuracy drop while reducing DRAM accesses to just 18% of prior approaches. This yields significantly lower memory overhead and sustained high throughput, establishing a practical pathway for efficient DAT deployment on edge devices.

Technology Category

Application Category

📝 Abstract
Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access patterns, posing significant challenges for efficient hardware deployment. Existing acceleration methods either incur high hardware overhead or compromise model accuracy. To address these issues, this paper proposes a hardware-friendly optimization framework for DAT. First, a neural architecture search (NAS)-based method with a new slicing strategy is proposed to automatically divide the input feature into uniform patches during the inference process, avoiding memory conflicts without modifying model architecture. The method explores the optimal slice configuration by jointly optimizing hardware cost and inference accuracy. Secondly, an FPGA-based verification system is designed to test the performance of this framework on edge-side hardware. Algorithm experiments on the ImageNet-1K dataset demonstrate that our hardware-friendly framework can maintain have only 0.2% accuracy drop compared to the baseline DAT. Hardware experiments on Xilinx FPGA show the proposed method reduces DRAM access times to 18% compared with existing DAT acceleration methods.
Problem

Research questions and friction points this paper is trying to address.

Optimize deformable transformer memory access for hardware efficiency
Balance model accuracy and hardware cost via NAS
Reduce DRAM access times in FPGA-based edge deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

NAS-based slicing for uniform feature patches
Hardware-accuracy co-optimized slice configuration
FPGA system reducing DRAM access by 82%
🔎 Similar Papers
No similar papers found.
Wendong Mao
Wendong Mao
Sun Yat-Sen University, Assistant Professor
Artificial IntelligenceDeep LearningVLSIHardware DesignAcceleration
M
Mingfan Zhao
School of Integrated Circuits, Sun Yat-Sen University, Shenzhen, China
J
Jianfeng Guan
School of Integrated Circuits, Sun Yat-Sen University, Shenzhen, China
Q
Qiwei Dong
School of Electronic Science and Engineering, Nanjing University, Nanjing, China
Zhongfeng Wang
Zhongfeng Wang
Nanjing University
VLSIFECDSPMIMONeural Network