Mestra: Exploring Migration on Virtualized CGRAs

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenges of low resource utilization in large-scale coarse-grained reconfigurable arrays (CGRAs) for single applications and the difficulties of resource fragmentation and dynamic migration in multi-tenant scenarios. The authors propose Mestra, the first CGRA virtualization framework supporting real-time migration of both stateful and stateless kernels, leveraging migration as a defragmentation mechanism. By integrating a custom tightly coupled controller, readback pathways, a dynamic scheduler, and a lightweight hardware virtualization architecture, Mestra enables efficient resource sharing on the Alveo-U280 platform. Experimental results demonstrate that, compared to a single-tenant baseline, Mestra reduces workload completion time by up to 70.48%, decreases tail latency by 29.60% under fragmented layouts, and incurs only 0.13% LUT overhead.

Technology Category

Application Category

📝 Abstract

As modern Coarse Grain Reconfigurable Arrays (CGRAs) grow in size, efficient utilization of the available fabric by a single application becomes increasingly difficult. Existing CGRA mappers either fail to utilize the available fabric or rely on rigid static code transformations with limited adaptability. Multi-tenant CGRAs have emerged as a promising solution to increase hardware utilization, but current attempts fail to address key challenges such as fabric fragmentation and live migration. To address this gap, we present Mestra, an end-to-end system for CGRA multi-tenancy that supports dynamic scheduling and resource allocation in a shared environment. Mestra addresses fabric fragmentation caused by kernels completing out of order by supporting both stateless and stateful live kernel migration as a de-fragmentation mechanism. We assess our solution on an Alveo-U280 data-center-grade FPGA card, reporting area, frequency, and power. Performance is evaluated using routines from the PolyBench benchmark suite and kernels derived from common machine learning operators. Results show that spatial sharing of the available fabric across multiple users improves workload makespan by up to 70.48%, while live kernel migration reduces tail latency on fragmented layouts by up to 29.60%. The custom tightly coupled controller and read-back paths required for virtualization and stateful migration introduce a LUT cost of 0.13% per region. Our evaluation reveals that multi-tenancy is important for efficient CGRA utilization, and live kernel migration can further improve performance by recovering fragmented space with minimal hardware cost.

Problem

Research questions and friction points this paper is trying to address.

CGRAs

multi-tenancy

fabric fragmentation

live migration

resource utilization

Innovation

Methods, ideas, or system contributions that make the work stand out.

live kernel migration

multi-tenancy

fabric fragmentation