Sharing GPUs and Programmable Switches in a Federated Testbed with SHARY

📅 2025-01-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In federated testbeds, scarce heterogeneous hardware—such as GPUs, P4-programmable switches, and smart NICs—is shared across institutions, leading to severe resource contention, complex management, and fragmented scheduling. To address these challenges, this paper proposes SHARY, a dynamic reservation system. Its key contributions are: (1) a lightweight hardware abstraction layer that unifies diverse accelerators via protocol-agnostic abstraction; (2) the first integration of a demand-driven GPU sharing model (FIGO) and a P4-switch reservation mechanism (SUP4RNET) into a unified federated scheduling framework; and (3) tight coupling of dynamic reservation, federated identity authentication, and policy-aware coordination. Experimental evaluation demonstrates a 62% reduction in average GPU wait time and sub-200 ms P4 reservation response latency, significantly improving hardware utilization and lowering barriers to cross-institutional AI and network experimentation.

Technology Category

Application Category

📝 Abstract
Federated testbeds enable collaborative research by providing access to diverse resources, including computing power, storage, and specialized hardware like GPUs, programmable switches and smart Network Interface Cards (NICs). Efficiently sharing these resources across federated institutions is challenging, particularly when resources are scarce and costly. GPUs are crucial for AI and machine learning research, but their high demand and expense make efficient management essential. Similarly, advanced experimentation on programmable data plane requires very expensive programmable switches (e.g., based on P4) and smart NICs. This paper introduces SHARY (SHaring Any Resource made easY), a dynamic reservation system that simplifies resource booking and management in federated environments. We show that SHARY can be adopted for heterogenous resources, thanks to an adaptation layer tailored for the specific resource considered. Indeed, it can be integrated with FIGO (Federated Infrastructure for GPU Orchestration), which enhances GPU availability through a demand-driven sharing model. By enabling real-time resource sharing and a flexible booking system, FIGO improves access to GPUs, reduces costs, and accelerates research progress. SHARY can be also integrated with SUP4RNET platform to reserve the access of P4 switches.
Problem

Research questions and friction points this paper is trying to address.

Federated Testing Platform
Resource Management
Artificial Intelligence Research
Innovation

Methods, ideas, or system contributions that make the work stand out.

SHARY
GPU optimization
P4 switch reservation
🔎 Similar Papers
No similar papers found.
Stefano Salsano
Stefano Salsano
Dip. Ingegneria Elettronica - University of Rome Tor Vergata
computer networksSDNmobile computingsoftware defined networkingnetwork function virtualization
A
Andrea Mayer
University of Rome Tor Vergata, CNIT
P
Paolo Lungaroni
University of Rome Tor Vergata
P
Pierpaolo Loreti
University of Rome Tor Vergata, CNIT
L
Lorenzo Bracciale
University of Rome Tor Vergata, CNIT
Andrea Detti
Andrea Detti
Professor of Mobile Networks and Cloud Computing at University of Rome "Tor Vergata"
Cloud ComputingMobile Wireless Networks
M
Marco Orazi
University of Rome Tor Vergata
P
Paolo Giaccone
Politecnico di Torino
Fulvio Risso
Fulvio Risso
Politecnico di Torino
SDNNFVfog computingservice orchestrationhigh speed network processing
A
Alessandro Cornacchia
KAUST (King Abdullah University of Science and Technology)
Carla Fabiana Chiasserini
Carla Fabiana Chiasserini
Full Professor, Politecnico di Torino, Italy
Mobile networks