Refining Datapath for Microscaling ViTs

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Hardware acceleration of Vision Transformers (ViTs) faces bottlenecks including poor low-precision mapping of numerically sensitive operators (e.g., Softmax, LayerNorm) and high CPU–accelerator communication overhead. This paper introduces Microscaling Integer (MXInt), a novel quantized data format enabling the first end-to-end FPGA deployment of ViTs with full-operator support. MXInt restructures the computational pipeline to enable high-accuracy fixed-point implementation of precision-critical operators; we further design customized pipelined execution and software–hardware co-designed operator mapping. Under <1% top-1 accuracy degradation, our implementation achieves ≥93× speedup over FP16 baselines and ≥1.9× over state-of-the-art accelerators, while significantly improving area efficiency and end-to-end throughput. The core innovation lies in MXInt-driven full-stack co-optimization—extending beyond conventional matrix-multiplication-only acceleration to holistically optimize arithmetic, memory, and control across the entire ViT stack.

Technology Category

Application Category

📝 Abstract

Vision Transformers (ViTs) leverage the transformer architecture to effectively capture global context, demonstrating strong performance in computer vision tasks. A major challenge in ViT hardware acceleration is that the model family contains complex arithmetic operations that are sensitive to model accuracy, such as the Softmax and LayerNorm operations, which cannot be mapped onto efficient hardware with low precision. Existing methods only exploit parallelism in the matrix multiplication operations of the model on hardware and keep these complex operations on the CPU. This results in suboptimal performance due to the communication overhead between the CPU and accelerator. Can new data formats solve this problem? In this work, we present the first ViT accelerator that maps all operations of the ViT models onto FPGAs. We exploit a new arithmetic format named Microscaling Integer (MXInt) for datapath designs and evaluate how different design choices can be made to trade off accuracy, hardware performance, and hardware utilization. Our contributions are twofold. First, we quantize ViTs using the MXInt format, achieving both high area efficiency and accuracy. Second, we propose MXInt-specific hardware optimization that map these complex arithmetic operations into custom hardware. Within 1% accuracy loss, our method achieves at least 93$ imes$ speedup compared to Float16 and at least 1.9$ imes$ speedup compared to related work.

Problem

Research questions and friction points this paper is trying to address.

Hardware acceleration challenge for Vision Transformers (ViTs) due to complex arithmetic operations

Suboptimal performance from CPU-accelerator communication overhead in existing methods

Need for new data formats to map all ViT operations efficiently onto FPGAs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Microscaling Integer (MXInt) format

Maps all ViT operations onto FPGAs

Optimizes hardware for complex arithmetic operations

🔎 Similar Papers

No similar papers found.