Literature Study on Operational Data Analytics Frameworks in Large-scale Computing Infrastructures

📅 2026-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the growing complexity of operating large-scale computing infrastructures, such as high-performance computing (HPC) systems, where existing operational data analytics (ODA) frameworks struggle to effectively support multi-layered, distributed graph processing ecosystems. The work provides a systematic review of core ODA components, benchmarks prevailing approaches, and proposes a novel holistic ODA framework that integrates the distributed graph processing hierarchy introduced by Sherif Sak et al., thereby extending the functional scope of Netti et al.’s prior work. By unifying fine-grained monitoring, ODA architecture, and graph processing system design, the proposed framework significantly enhances structural integrity and functional extensibility, markedly improving operational efficiency. Furthermore, it illuminates key research directions for ODA in high-performance computing environments.

Technology Category

Application Category

📝 Abstract
By 2025, there are zettabytes of data generated every year. The size and complexity of modern large-scale computing infrastructures like High-Performance Computing (HPC) systems continue to evolve and become complex, leaving us wondering about their manageability and sustainability concerns. Because of this reason, those complex systems are provided with fine-grained monitoring and Operational Data Analytics (ODA) capabilities to optimise their efficiency. In this literature study, we list the fundamental pillars of the large-scale computing infrastructures which enable its ODA capabilities, and conduct a study of the popular ODA frameworks operating in various such environments (predominantly HPC). Based on that, we propose a more holistic ODA framework matching the various layers of a large-scale graph-processing distributed ecosystem proposed by Sherif Sak et al, that extends the ODA functionalities presented in an existing novel ODA framework proposed by Netti et al. We compare the holistic ODA framework proposed by us to some of the state-of-the-art frameworks that we study as part of this literature to highlight the novelty, which would hopefully draw more attention to perform extensive research in this field. As part of creating awareness, we highlight the significant operational efficiencies observed as a result of the implementation of the state-of-the-art ODA frameworks to make the study appear beneficial for the readers, and lastly, discuss the trending research work ongoing in this field.
Problem

Research questions and friction points this paper is trying to address.

Operational Data Analytics
Large-scale Computing Infrastructures
High-Performance Computing
System Manageability
Sustainability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Operational Data Analytics
Holistic Framework
Large-scale Computing
HPC Monitoring
Distributed Graph Processing
🔎 Similar Papers
No similar papers found.
S
Shekhar Suman
Vrije Universiteit Amsterdam, Universiteit van Amsterdam
X
Xiaoyu Chu
Vrije Universiteit Amsterdam, @Large Research
Alexandru Iosup
Alexandru Iosup
Professor of Comp.Sci., VU University Amsterdam
Distributed SystemsPerformance EngineeringCloud ComputingBig DataComputer Ecosystems