WANDER: An Explainable Decision-Support Framework for HPC

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-performance computing (HPC) system configuration spaces are high-dimensional and tightly coupled; existing predictive tools lack structured exploration of alternatives, causal interpretability, and user-centered reconfiguration guidance. To address this, we propose the first unified decision-support framework integrating predictive modeling, counterfactual reasoning, and explainable AI (XAI). Specifically: (1) we construct a causal graph model to explicitly encode configuration–performance causal relationships; (2) we design a composite trade-off scoring mechanism that jointly quantifies prediction uncertainty, causal consistency, and similarity to historical configuration distributions; and (3) we generate trustworthy, human-readable counterfactual configuration recommendations that satisfy user-specified objectives and constraints. Evaluated on multi-source HPC datasets, our approach significantly improves recommendation interpretability and tuning guidance effectiveness. It establishes a novel paradigm for intelligent, causally grounded HPC system optimization.

Technology Category

Application Category

📝 Abstract
High-performance computing (HPC) systems expose many interdependent configuration knobs that impact runtime, resource usage, power, and variability. Existing predictive tools model these outcomes, but do not support structured exploration, explanation, or guided reconfiguration. We present WANDER, a decision-support framework that synthesizes alternate configurations using counterfactual analysis aligned with user goals and constraints. We introduce a composite trade-off score that ranks suggestions based on prediction uncertainty, consistency between feature-target relationships using causal models, and similarity between feature distributions against historical data. To our knowledge, WANDER is the first such system to unify prediction, exploration, and explanation for HPC tuning under a common query interface. Across multiple datasets WANDER generates interpretable and trustworthy, human-readable alternatives that guide users to achieve their performance objectives.
Problem

Research questions and friction points this paper is trying to address.

Lack of structured exploration in HPC configuration tuning
Absence of guided reconfiguration for performance optimization
Need for explainable decision-support in HPC systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses counterfactual analysis for configuration synthesis
Introduces composite trade-off score for ranking
Unifies prediction exploration explanation under query
Ankur Lahiry
Ankur Lahiry
Doctoral Instructional Assistant
Machine learningHigh Performance ComputingCompiler
B
Banooqa H. Banday
Texas State University, San Marcos, United States
T
T. Z. Islam
Texas State University, San Marcos, United States