TX-Digital Twin: Visualizing Supercomputer GPU Performance Data Stream

📅 2026-03-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of efficient and intuitive visualization tools for monitoring GPU-accelerated resources in current supercomputing systems. The authors propose a novel approach that integrates a 3D game engine with multi-source performance data streams to construct a three-dimensional digital twin of a supercomputer within the TX-Digital Twin platform. For the first time, critical GPU metrics—including memory utilization, temperature, and power consumption—are embedded into an immersive 3D visualization environment. Through optimized rendering calls, the system achieves real-time and historical visualization of GPU performance indicators with high clarity and minimal overhead. This significantly enhances the intuitiveness and operability of supercomputing resource monitoring, offering system administrators and users an effective means to observe and manage complex GPU workloads.
📝 Abstract
Supercomputers are complex, dynamic systems that serve thousands of users and are built with thousands of compute nodes. Due to the vast amounts of system and performance data needed to accurately capture their status, supercomputers require complex methods to monitor, maintain, and optimize. Data visualization is a powerful technique for overseeing these large streams of data in an easily interpretable way. The MIT Lincoln Laboratory Supercomputing Center (LLSC) enables effective monitoring through combining 3D gaming technology with compound data streams in the TX-Digital Twin, a 3D simulation of the supercomputer. The TX-Digital Twin offers both live and historical data, in visual and text formats, and tracks a multitude of revealing performance metrics. Recent increasing interest in GPU-accelerated computing has driven a need for monitoring and maintenance of GPU-accelerated resources in supercomputers. In this paper, we build on our previous solution by integrating the visualization of additional GPU metrics, such as GPU memory usage, temperature, and power draw, into the TX-Digital Twin. Using techniques in draw call optimization, we add clear and effective displays of the new metrics while keeping the effects on performance minimal.
Problem

Research questions and friction points this paper is trying to address.

GPU monitoring
supercomputer visualization
performance metrics
Digital Twin
GPU-accelerated computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital Twin
GPU monitoring
data visualization
draw call optimization
supercomputing
🔎 Similar Papers
No similar papers found.
E
Elena Baskakova
MIT
W
William Bergeron
MIT
M
Matthew Hubbell
MIT
H
Hayden Jananthan
MIT
Jeremy Kepner
Jeremy Kepner
MIT Lincoln Laboratory Supercomputing Center
high performance computingsupercomputingsignal processingmatlabgraph algorithms