Towards Efficient and Practical GPU Multitasking in the Era of LLM

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address low GPU utilization and poor adaptability to diverse AI workloads caused by single-task execution in the large-model era, this paper proposes, for the first time, a systematic GPU multitasking paradigm. Inspired by CPU operating system resource management, we design an OS-like GPU resource management layer that enables dynamic resource partitioning, strong task isolation, and priority-aware scheduling. We formally define key requirements and core challenges—including fine-grained resource sharing, low-overhead context switching, and cross-task QoS guarantees—and outline concrete technical pathways to address them. This work establishes a theoretical foundation and a holistic architectural framework for evolving GPUs from single-task devices toward efficient, secure, and schedulable multitasking compute platforms. It advances the development of high-utilization, high-performance AI computing systems.

Technology Category

Application Category

📝 Abstract

GPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet the demands of modern AI workloads. In this work, we highlight the key requirements for GPU multitasking, examine prior efforts, and discuss why they fall short. To advance toward efficient and practical GPU multitasking, we envision a resource management layer, analogous to a CPU operating system, to handle various aspects of GPU resource management and sharing. We outline the challenges and potential solutions, and hope this paper inspires broader community efforts to build the next-generation GPU compute paradigm grounded in multitasking.

Problem

Research questions and friction points this paper is trying to address.

Address inefficiency of GPU singletasking with growing workloads

Develop GPU multitasking for modern AI demands like CPUs

Propose resource management layer for GPU sharing challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU multitasking for modern AI workloads

Resource management layer like CPU OS

Next-generation GPU compute paradigm

🔎 Similar Papers

No similar papers found.