SIREN: Software Identification and Recognition in HPC Systems

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional software identification methods in HPC systems—relying on job names or filenames—are ineffective against user-defined naming conventions, unknown software, and repeated application executions, hindering system optimization and security analysis. To address this, we propose SIREN, the first framework to jointly leverage process-level metadata, environment variables, and fuzzy hashing of executable binaries for fine-grained, privacy-preserving software identification across compiler toolchains and software versions. SIREN employs a lightweight monitoring agent enabling both discovery of previously unseen software and precise detection of repeated execution patterns. Evaluated on the LUMI exascale supercomputer, SIREN achieves significantly higher identification accuracy and observability compared to baseline approaches. It establishes a novel paradigm for HPC security governance and resource optimization through robust, scalable, and privacy-aware software provenance tracking.

Technology Category

Application Category

📝 Abstract
HPC systems use monitoring and operational data analytics to ensure efficiency, performance, and orderly operations. Application-specific insights are crucial for analyzing the increasing complexity and diversity of HPC workloads, particularly through the identification of unknown software and recognition of repeated executions, which facilitate system optimization and security improvements. However, traditional identification methods using job or file names are unreliable for arbitrary user-provided names (a.out). Fuzzy hashing of executables detects similarities despite changes in executable version or compilation approach while preserving privacy and file integrity, overcoming these limitations. We introduce SIREN, a process-level data collection framework for software identification and recognition. SIREN improves observability in HPC by enabling analysis of process metadata, environment information, and executable fuzzy hashes. Findings from a first opt-in deployment campaign on LUMI show SIREN's ability to provide insights into software usage, recognition of repeated executions of known applications, and similarity-based identification of unknown applications.
Problem

Research questions and friction points this paper is trying to address.

Identifying unknown software in HPC systems
Recognizing repeated executions of applications
Overcoming unreliable traditional identification methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process-level data collection framework
Fuzzy hashing for executable similarity detection
Analysis of process metadata and environment information
🔎 Similar Papers
No similar papers found.
T
Thomas Jakobsche
University of Basel, Switzerland
F
Fredrik Robertsén
CSC IT Center for Science, Finland
J
Jessica R. Jones
Hewlett Packard Enterprise, United Kingdom
Utz-Uwe Haus
Utz-Uwe Haus
Head of HPE HPC EMEA Research Lab, Switzerland
High Performance Computing
F
Florina M. Ciorba
University of Basel, Switzerland