BanditWare: A Contextual Bandit-based Framework for Hardware Prediction

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

To address resource contention, performance degradation, increased latency, and poor energy efficiency arising from suboptimal hardware resource allocation in distributed systems, this paper proposes an online recommendation framework for dynamic hardware adaptation. It introduces, for the first time, contextual multi-armed bandits (Contextual MAB) to hardware selection—enabling offline-training-free, online continual learning with principled exploration-exploitation trade-offs, thereby departing from conventional data-intensive paradigms. The framework integrates real-time performance feedback modeling and native interfaces to NDP platforms, ensuring zero-friction deployment. Evaluated on three realistic workloads—Cycles, BurnPro3D, and matrix multiplication—the framework achieves significantly improved resource utilization, reduces end-to-end latency by 27.4% on average, and effectively mitigates priority inversion and system instability.

Technology Category

Application Category

📝 Abstract

Distributed computing systems are essential for meeting the demands of modern applications, yet transitioning from single-system to distributed environments presents significant challenges. Misallocating resources in shared systems can lead to resource contention, system instability, degraded performance, priority inversion, inefficient utilization, increased latency, and environmental impact. We present BanditWare, an online recommendation system that dynamically selects the most suitable hardware for applications using a contextual multi-armed bandit algorithm. BanditWare balances exploration and exploitation, gradually refining its hardware recommendations based on observed application performance while continuing to explore potentially better options. Unlike traditional statistical and machine learning approaches that rely heavily on large historical datasets, BanditWare operates online, learning and adapting in real-time as new workloads arrive. We evaluated BanditWare on three workflow applications: Cycles (an agricultural science scientific workflow) BurnPro3D (a web-based platform for fire science) and a matrix multiplication application. Designed for seamless integration with the National Data Platform (NDP), BanditWare enables users of all experience levels to optimize resource allocation efficiently.

Problem

Research questions and friction points this paper is trying to address.

Dynamic hardware selection for distributed systems

Reducing resource contention and performance degradation

Online learning for real-time workload adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses contextual bandit for hardware prediction

Balances exploration and exploitation dynamically

Operates online without large historical datasets

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Software Dev Engineer, EC2 Nitro

Amazon

USA, WA, Seattle - 143,700.00 - 194,400.00 USD annually

Seattle, WA, USA

Authors to Follow