GPU Memory and Utilization Estimation for Training-Aware Resource Management: Opportunities and Limitations

📅 2026-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the performance degradation and GPU memory overflow commonly caused by resource contention when co-locating deep learning training tasks, highlighting the urgent need for accurate, low-overhead estimation of GPU memory usage and utilization. For the first time, it systematically evaluates the accuracy, generalizability, intrusiveness, and latency of three mainstream estimation approaches—analytical models, CPU-side libraries, and machine learning methods—across diverse hardware platforms and model architectures. A synthetic benchmark dataset encompassing MLPs, CNNs, and Transformers is constructed to enable cross-generational empirical analysis. The findings reveal critical limitations: analytical models exhibit strong hardware dependency, CPU-based libraries incur high intrusiveness, and machine learning methods suffer from poor cross-architecture generalization. The authors open-source the complete toolchain and dataset to support training-aware resource scheduling.

Technology Category

Application Category

📝 Abstract
Collocating deep learning training tasks improves GPU utilization but causes drastic slowdowns due to resource contention and risks Out-of-Memory (OOM) failures. Accurate memory estimation is essential for robust collocation, while GPU utilization -- a key proxy for resource contention -- enables interference-aware scheduling to reduce slowdowns and improve throughput. Existing GPU memory estimators span three paradigms -- analytical models, CPU-side libraries, and ML-based estimators -- each with distinct limitations: dependence on detailed model specifications, intrusive integration, poor generalization, and varying latency overhead. GPU heterogeneity further complicates estimation, as identical tasks can exhibit markedly different memory footprints across hardware generations. GPU utilization remains comparatively understudied, further complicated by the non-additive nature of utilization metrics and hardware sensitivity. We conduct a systematic analysis of representative estimators from each paradigm -- Horus, PyTorch FakeTensor, and our lightweight ML-based estimator -- evaluating accuracy, generalizability, and practical overhead. We construct a synthetic dataset spanning MLPs, CNNs, and Transformers with controlled architectural variations, and train MLP- and Transformer-based estimators for memory prediction. We further experiment with utilization estimation on the same dataset. Our evaluation reveals key tradeoffs and validates estimators against real-world unseen models. Significant challenges remain: analytical models are hardware-dependent, CPU-side libraries impose intrusive integration costs, and ML-based estimators struggle with cross-architecture generalization. We release all datasets, tools, and artifacts to support further research.
Problem

Research questions and friction points this paper is trying to address.

GPU memory estimation
GPU utilization
training task collocation
resource contention
Out-of-Memory (OOM)
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU memory estimation
GPU utilization
training-aware scheduling
resource contention
cross-architecture generalization