Towards Efficient and Practical GPU Multitasking in the Era of LLM

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low GPU utilization and poor adaptability to diverse AI workloads caused by single-task execution in the large-model era, this paper proposes, for the first time, a systematic GPU multitasking paradigm. Inspired by CPU operating system resource management, we design an OS-like GPU resource management layer that enables dynamic resource partitioning, strong task isolation, and priority-aware scheduling. We formally define key requirements and core challenges—including fine-grained resource sharing, low-overhead context switching, and cross-task QoS guarantees—and outline concrete technical pathways to address them. This work establishes a theoretical foundation and a holistic architectural framework for evolving GPUs from single-task devices toward efficient, secure, and schedulable multitasking compute platforms. It advances the development of high-utilization, high-performance AI computing systems.

Technology Category

Application Category

📝 Abstract
GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet the demands of modern AI workloads. In this work, we highlight the key requirements for GPU multitasking, examine prior efforts, and discuss why they fall short. To advance toward efficient and practical GPU multitasking, we envision a resource management layer, analogous to a CPU operating system, to handle various aspects of GPU resource management and sharing. We outline the challenges and potential solutions, and hope this paper inspires broader community efforts to build the next-generation GPU compute paradigm grounded in multitasking.
Problem

Research questions and friction points this paper is trying to address.

Address inefficiency of GPU singletasking with growing workloads
Develop GPU multitasking for modern AI demands like CPUs
Propose resource management layer for GPU sharing challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU multitasking for modern AI workloads
Resource management layer like CPU OS
Next-generation GPU compute paradigm
🔎 Similar Papers
No similar papers found.