🤖 AI Summary
To address the challenge of balancing resource utilization and performance when deploying Mixture-of-Experts Vision Transformers (MoE-ViT) on FPGAs, this paper proposes an end-to-end configurable hardware architecture. We innovatively design a streaming attention core and a reusable linear core operating in concert, enabling dynamic expert routing and fine-grained parallelism. A two-stage heuristic search algorithm is introduced to automatically optimize architectural parameters under cross-platform resource constraints. Leveraging streaming computation, mixed-precision execution, and low-latency memory access optimization, our implementation achieves 1.34× and 3.35× throughput improvements, and 1.75× and 1.54× energy-efficiency gains on Xilinx ZCU102 and Alveo U280 FPGAs, respectively—significantly outperforming state-of-the-art FPGA-based MoE-ViT accelerators.
📝 Abstract
Compared to traditional Vision Transformers (ViT), Mixture-of-Experts Vision Transformers (MoE-ViT) are introduced to scale model size without a proportional increase in computational complexity, making them a new research focus. Given the high performance and reconfigurability, FPGA-based accelerators for MoE-ViT emerge, delivering substantial gains over general-purpose processors. However, existing accelerators often fall short of fully exploring the design space, leading to suboptimal trade-offs between resource utilization and performance. To overcome this problem, we introduce UbiMoE, a novel end-to-end FPGA accelerator tailored for MoE-ViT. Leveraging the unique computational and memory access patterns of MoE-ViTs, we develop a latency-optimized streaming attention kernel and a resource-efficient reusable linear kernel, effectively balancing performance and resource consumption. To further enhance design efficiency, we propose a two-stage heuristic search algorithm that optimally tunes hardware parameters for various FPGA resource constraints. Compared to state-of-the-art (SOTA) FPGA designs, UbiMoE achieves 1.34x and 3.35x throughput improvements for MoE-ViT on Xilinx ZCU102 and Alveo U280 platforms, respectively, while enhancing energy efficiency by 1.75x and 1.54x. Our implementation is available at https://github.com/DJ000011/UbiMoE.