VersaSlot: Efficient Fine-grained FPGA Sharing with Big.Little Slots and Live Migration in FPGA Cluster

πŸ“… 2025-03-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address resource contention and task execution blocking caused by dynamic partial reconfiguration (DPR) in shared-datacenter FPGAs, this paper proposes VersaSlotβ€”a novel system architecture. Methodologically, it introduces the first FPGA-specific Big.Little spatio-temporal partitioning architecture, enabling fine-grained, high-utilization decoupling of spatial and temporal resources; designs a contention-aware adaptive partition scheduling mechanism; and supports cross-board, non-interruptible real-time migration. Implemented on the Xilinx UltraScale+ ZCU216 platform and integrated with a cluster-wide unified scheduler, VersaSlot achieves significant improvements over state-of-the-art approaches. Evaluation results show that, compared to time-division multiplexing baselines, it reduces average response time by 13.66Γ—; outperforms the best existing spatio-temporal multiplexing scheme by 2.19Γ—; and increases average logic utilization (LUTs) and flip-flop (FF) utilization by 35% and 29%, respectively.

Technology Category

Application Category

πŸ“ Abstract
As FPGAs gain popularity for on-demand application acceleration in data center computing, dynamic partial reconfiguration (DPR) has become an effective fine-grained sharing technique for FPGA multiplexing. However, current FPGA sharing encounters partial reconfiguration contention and task execution blocking problems introduced by the DPR, which significantly degrade application performance. In this paper, we propose VersaSlot, an efficient spatio-temporal FPGA sharing system with novel Big.Little slot architecture that can effectively resolve the contention and task blocking while improving resource utilization. For the heterogeneous Big.Little architecture, we introduce an efficient slot allocation and scheduling algorithm, along with a seamless cross-board switching and live migration mechanism, to maximize FPGA multiplexing across the cluster. We evaluate the VersaSlot system on an FPGA cluster composed of the latest Xilinx UltraScale+ FPGAs (ZCU216) and compare its performance against four existing scheduling algorithms. The results demonstrate that VersaSlot achieves up to 13.66x lower average response time than the traditional temporal FPGA multiplexing, and up to 2.19x average response time improvement over the state-of-the-art spatio-temporal sharing systems. Furthermore, VersaSlot enhances the LUT and FF resource utilization by 35% and 29% on average, respectively.
Problem

Research questions and friction points this paper is trying to address.

Resolves partial reconfiguration contention in FPGA sharing.
Addresses task execution blocking in dynamic partial reconfiguration.
Improves FPGA resource utilization with Big.Little slot architecture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Big.Little slot architecture for FPGA sharing
Efficient slot allocation and scheduling algorithm
Seamless cross-board switching and live migration
πŸ”Ž Similar Papers
No similar papers found.
Jianfeng Gu
Jianfeng Gu
Department of Computer Engineering, Technical University of Munich
Distributed SystemCloud ComputingAutonomous Driving System
H
Hao Wang
Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Munich, Germany
Xiaorang Guo
Xiaorang Guo
Technical University of Munich
Hardware DesignComputer ArchitectureQuantum ComputingBiomedical Circuits
Martin Schulz
Martin Schulz
Technical University of Munich
Computer Architecture and Parallel Systems
M
Michael Gerndt
Chair of Computer Architecture and Parallel Systems, Technical University of Munich, Munich, Germany