A Study on the Resource Utilization and User Behavior on Titan Supercomputer

📅 2026-05-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
This study addresses the challenge of enhancing productivity in supercomputing clusters and informing the design of exascale systems by analyzing job scheduling logs, GPU trace data, and domain-specific metadata from the Titan supercomputer. It systematically investigates the relationship between requested and actual resource utilization and its temporal evolution. Employing correlation analysis, clustering, and neural networks, the work presents the first comprehensive characterization of seasonal patterns in HPC resource usage and develops a transferable model for predicting resource utilization. By identifying key user behavior patterns, the research substantially improves the accuracy of forecasting future resource demands, thereby providing empirical foundations for optimizing configuration and planning of high-performance computing systems.
📝 Abstract
Understanding HPC facilities users' behaviors and how computational resources are requested and utilized is not only crucial for the cluster productivity but also essential for designing and constructing future exascale HPC systems. This paper tackles Challenge 4, 'Analyzing Resource Utilization and User Behavior on Titan Supercomputer', of the 2021 Smoky Mountains Conference Data Challenge. Specifically, we dig deeper inside the records of Titan to discover patterns and extract relationships. This paper explores the workload distribution and usage patterns from resource manager system logs, GPU traces, and scientific areas information collected from the Titan supercomputer. Furthermore, we want to know how resource utilization and user behaviors change over time. Using data science methods, such as correlations, clustering, or neural networks, our findings allow us to investigate how projects, jobs, nodes, GPUs and memory are related. We provide insights about seasonality usage of resources and a predictive model for forecasting utilization of Titan Supercomputer. In addition, the described methodology can be easily adopted in other HPC clusters.
Problem

Research questions and friction points this paper is trying to address.

resource utilization
user behavior
HPC
workload distribution
Titan supercomputer
Innovation

Methods, ideas, or system contributions that make the work stand out.

resource utilization
user behavior
HPC workload analysis
predictive modeling
data science
🔎 Similar Papers
No similar papers found.