SIREN: Software Identification and Recognition in HPC Systems

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional software identification methods in HPC systems—relying on job names or filenames—are ineffective against user-defined naming conventions, unknown software, and repeated application executions, hindering system optimization and security analysis. To address this, we propose SIREN, the first framework to jointly leverage process-level metadata, environment variables, and fuzzy hashing of executable binaries for fine-grained, privacy-preserving software identification across compiler toolchains and software versions. SIREN employs a lightweight monitoring agent enabling both discovery of previously unseen software and precise detection of repeated execution patterns. Evaluated on the LUMI exascale supercomputer, SIREN achieves significantly higher identification accuracy and observability compared to baseline approaches. It establishes a novel paradigm for HPC security governance and resource optimization through robust, scalable, and privacy-aware software provenance tracking.

Technology Category

Application Category

📝 Abstract

HPC systems use monitoring and operational data analytics to ensure efficiency, performance, and orderly operations. Application-specific insights are crucial for analyzing the increasing complexity and diversity of HPC workloads, particularly through the identification of unknown software and recognition of repeated executions, which facilitate system optimization and security improvements. However, traditional identification methods using job or file names are unreliable for arbitrary user-provided names (a.out). Fuzzy hashing of executables detects similarities despite changes in executable version or compilation approach while preserving privacy and file integrity, overcoming these limitations. We introduce SIREN, a process-level data collection framework for software identification and recognition. SIREN improves observability in HPC by enabling analysis of process metadata, environment information, and executable fuzzy hashes. Findings from a first opt-in deployment campaign on LUMI show SIREN's ability to provide insights into software usage, recognition of repeated executions of known applications, and similarity-based identification of unknown applications.

Problem

Research questions and friction points this paper is trying to address.

Identifying unknown software in HPC systems

Recognizing repeated executions of applications

Overcoming unreliable traditional identification methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process-level data collection framework

Fuzzy hashing for executable similarity detection

Analysis of process metadata and environment information

🔎 Similar Papers

No similar papers found.

Authors to Follow