MaLV-OS: Rethinking the Operating System Architecture for Machine Learning in Virtualized Clouds

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing research predominantly focuses on leveraging machine learning (ML) to optimize operating system (OS) decisions, overlooking the reverse direction—enhancing ML model performance through OS design. This paper introduces MaLV-OS, the first OS architecture specialized for ML workloads in virtualized cloud environments. Methodologically, it establishes the “ML-specialized OS” paradigm by offloading model-sensitive operations—including memory layout optimization and GPU kernel scheduling—to the OS layer; designs Micro-LAKE, a microkernel natively supporting GPU kernel programming and fine-grained virtualization; and implements a dynamically loadable ML-as-a-Service (MLaaS) subsystem enabling model-aware adaptive resource scheduling. Evaluation demonstrates that MaLV-OS significantly reduces ML task latency and programming overhead, improves throughput and resource utilization, and supports runtime policy switching.

Technology Category

Application Category

📝 Abstract
A large body of research has employed Machine Learning (ML) models to develop learned operating systems (OSes) and kernels. The latter dynamically adapts to the job load and dynamically adjusts resources (CPU, IO, memory, network bandwidth) allocation to respond to the actual user demand. What this work has in common is that it utilizes ML to improve kernel decisions. To this day, and to the best of our knowledge, no work has taken the opposite direction, i.e., using OS to improve ML. While some work proposes applying system-level optimizations to ML algorithms, they do not tailor the OS to adapt to the ML context. To address this limitation, we take an orthogonal approach in this paper by leveraging the OS to enhance the performance of ML models and algorithms. We explore the path towards an ML-specialized OS, MaLV-OS. MaLV-OS rethinks the OS architecture to make it specifically tailored to ML workloads, especially in virtualized clouds, which are now widely used to run ML applications. MaLV-OS envisioned architecture includes (1) a micro-kernel, Micro-LAKE, which allows kernel space applications to use the GPU, and (2) an MLaaS (ML as a Service) subsystem that gathers ML models to help Micro-LAKE with memory management and CPU scheduling. MaLV-OS architecture also offloads system-sensitive parts of the models to the OS, to lighten the model complexity and programming, and speed up its execution. Finally, MaLV-OS integrates an open-source GPU virtualization software, merged directly into the hypervisor. For more flexibility, MaLV-OS vision is to enable the virtual machine to dynamically select MLaaS policies that can improve the performance of the model the user is running. Because MLaaS is designed as loadable kernel modules, the MaLV-OS architecture enables the dynamic addition of new capabilities to the MLaaS subsystem.
Problem

Research questions and friction points this paper is trying to address.

Rethinking OS architecture for ML workloads in virtualized clouds
Using OS to enhance ML model and algorithm performance
Designing ML-specialized OS with dynamic resource adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Micro-kernel enables GPU use in kernel space
MLaaS subsystem optimizes memory and CPU
GPU virtualization integrated into hypervisor
🔎 Similar Papers
No similar papers found.