BanditWare: A Contextual Bandit-based Framework for Hardware Prediction

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address resource contention, performance degradation, increased latency, and poor energy efficiency arising from suboptimal hardware resource allocation in distributed systems, this paper proposes an online recommendation framework for dynamic hardware adaptation. It introduces, for the first time, contextual multi-armed bandits (Contextual MAB) to hardware selection—enabling offline-training-free, online continual learning with principled exploration-exploitation trade-offs, thereby departing from conventional data-intensive paradigms. The framework integrates real-time performance feedback modeling and native interfaces to NDP platforms, ensuring zero-friction deployment. Evaluated on three realistic workloads—Cycles, BurnPro3D, and matrix multiplication—the framework achieves significantly improved resource utilization, reduces end-to-end latency by 27.4% on average, and effectively mitigates priority inversion and system instability.

Technology Category

Application Category

📝 Abstract
Distributed computing systems are essential for meeting the demands of modern applications, yet transitioning from single-system to distributed environments presents significant challenges. Misallocating resources in shared systems can lead to resource contention, system instability, degraded performance, priority inversion, inefficient utilization, increased latency, and environmental impact. We present BanditWare, an online recommendation system that dynamically selects the most suitable hardware for applications using a contextual multi-armed bandit algorithm. BanditWare balances exploration and exploitation, gradually refining its hardware recommendations based on observed application performance while continuing to explore potentially better options. Unlike traditional statistical and machine learning approaches that rely heavily on large historical datasets, BanditWare operates online, learning and adapting in real-time as new workloads arrive. We evaluated BanditWare on three workflow applications: Cycles (an agricultural science scientific workflow) BurnPro3D (a web-based platform for fire science) and a matrix multiplication application. Designed for seamless integration with the National Data Platform (NDP), BanditWare enables users of all experience levels to optimize resource allocation efficiently.
Problem

Research questions and friction points this paper is trying to address.

Dynamic hardware selection for distributed systems
Reducing resource contention and performance degradation
Online learning for real-time workload adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contextual bandit for hardware prediction
Balances exploration and exploitation dynamically
Operates online without large historical datasets
🔎 Similar Papers
No similar papers found.
T
Taina Coleman
University of California San Diego, La Jolla, CA 92093, USA
H
Hena Ahmed
University of California San Diego, La Jolla, CA 92093, USA
R
Ravi Shende
University of California San Diego, La Jolla, CA 92093, USA
I
Ismael Perez
University of California San Diego, La Jolla, CA 92093, USA
Ilkay Altintas
Ilkay Altintas
SDSC/UCSD
Data Science and Big DataWorkflow ManagementDistributed ComputingProvenanceReproducibility