A Unified Ontology for Scalable Knowledge Graph-Driven Operational Data Analytics in High-Performance Computing Systems

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
HPC systems generate massive volumes of heterogeneous telemetry data, yet existing operational data analytics (ODA) rely on schema-less storage, impeding semantic integration and cross-platform interoperability. To address this, we propose UHPC-Onto—the first unified ontology model for HPC operations analytics—enabling semantic standardization and knowledge graph (KG) integration across diverse platforms (e.g., M100, F-DATA). Our approach combines ontology modeling refinement with lightweight KG construction, significantly reducing storage overhead: up to 38.84% reduction versus baselines under various deployment configurations, and up to 26.82% with additional optimizations. UHPC-Onto supports validation via 36 competency questions and provides a scalable, semantically rich infrastructure for efficient, interpretable, cross-datacenter telemetry analysis—particularly under complex workloads such as generative AI.

Technology Category

Application Category

📝 Abstract
Modern high-performance computing (HPC) systems generate massive volumes of heterogeneous telemetry data from millions of sensors monitoring compute, memory, power, cooling, and storage subsystems. As HPC infrastructures scale to support increasingly complex workloads-including generative AI-the need for efficient, reliable, and interoperable telemetry analysis becomes critical. Operational Data Analytics (ODA) has emerged to address these demands; however, the reliance on schema-less storage solutions limits data accessibility and semantic integration. Ontologies and knowledge graphs (KG) provide an effective way to enable efficient and expressive data querying by capturing domain semantics, but they face challenges such as significant storage overhead and the limited applicability of existing ontologies, which are often tailored to specific HPC systems only. In this paper, we present the first unified ontology for ODA in HPC systems, designed to enable semantic interoperability across heterogeneous data centers. Our ontology models telemetry data from the two largest publicly available ODA datasets-M100 (Cineca, Italy) and F-DATA (Fugaku, Japan)-within a single data model. The ontology is validated through 36 competency questions reflecting real-world stakeholder requirements, and we introduce modeling optimizations that reduce knowledge graph (KG) storage overhead by up to 38.84% compared to a previous approach, with an additional 26.82% reduction depending on the desired deployment configuration. This work paves the way for scalable ODA KGs and supports not only analysis within individual systems, but also cross-system analysis across heterogeneous HPC systems.
Problem

Research questions and friction points this paper is trying to address.

Handling massive heterogeneous telemetry data in HPC systems
Overcoming schema-less storage limitations for semantic integration
Reducing storage overhead in knowledge graphs for ODA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified ontology for HPC telemetry data
Model optimizations reduce KG storage overhead
Supports cross-system analysis in HPC
🔎 Similar Papers
No similar papers found.