🤖 AI Summary
In interactive deep learning training (IDLT), Jupyter-style notebooks often monopolize GPU resources for extended periods, leading to underutilization, high infrastructure costs, and idle hardware. To address this, we propose the first dynamic GPU scheduling framework tailored for IDLT: a Raft-based three-replica kernel architecture enabling GPU overcommitment, runtime binding, and seamless migration—marking the first integration of OS-level resource management mechanisms into interactive AI development environments. The system features workload-aware, automated cluster scaling. Evaluated on 17.5 hours of real-world IDLT workloads, it reduces cumulative GPU time by over 1,187 hours, significantly improves average GPU utilization, lowers training response latency, and ensures interactive real-time responsiveness.
📝 Abstract
Interactive notebook programming is universal in modern ML (machine learning) and AI (artificial intelligence) workflows. Notebook software like Jupyter and Google Colab provides a user-friendly, interactive, web-based programming interface and is widely used across science and engineering domains. A dominant application of production notebook workloads is interactive deep learning training (IDLT). To guarantee high interactivity, modern notebook platforms typically reserve GPU resources within actively running notebook sessions. These notebook sessions are long-running but exhibit intermittent and sporadic GPU usage. Consequently, during most of their lifetimes, notebook sessions do not use the reserved GPUs, resulting in extremely low GPU utilization and prohibitively high cost. In this paper, we introduce NotebookOS, a GPU-efficient notebook platform designed to meet the unique requirements of IDLT. NotebookOS uses a replicated notebook kernel design, where each kernel consists of three replicas distributed across separate GPU servers and synchronized via Raft. To optimize GPU utilization, NotebookOS oversubscribes server resources via kernel replication to leverage the relatively high task inter-arrival times in IDLT workloads. By dynamically allocating GPUs to kernel replicas only while they are actively executing notebook cells, NotebookOS maximizes the likelihood of immediate and interactive training upon notebook notebook-cell task submission. NotebookOS also migrates kernel replicas and automatically scales the GPU cluster under overload conditions. We evaluate NotebookOS extensively using production notebook workloads. Evaluation results show that NotebookOS saves 1,187+ GPU hours over a 17.5-hour real-world IDLT workload while greatly enhancing interactivity.