🤖 AI Summary
High-performance computing (HPC) system configuration spaces are high-dimensional and tightly coupled; existing predictive tools lack structured exploration of alternatives, causal interpretability, and user-centered reconfiguration guidance. To address this, we propose the first unified decision-support framework integrating predictive modeling, counterfactual reasoning, and explainable AI (XAI). Specifically: (1) we construct a causal graph model to explicitly encode configuration–performance causal relationships; (2) we design a composite trade-off scoring mechanism that jointly quantifies prediction uncertainty, causal consistency, and similarity to historical configuration distributions; and (3) we generate trustworthy, human-readable counterfactual configuration recommendations that satisfy user-specified objectives and constraints. Evaluated on multi-source HPC datasets, our approach significantly improves recommendation interpretability and tuning guidance effectiveness. It establishes a novel paradigm for intelligent, causally grounded HPC system optimization.
📝 Abstract
High-performance computing (HPC) systems expose many interdependent configuration knobs that impact runtime, resource usage, power, and variability. Existing predictive tools model these outcomes, but do not support structured exploration, explanation, or guided reconfiguration. We present WANDER, a decision-support framework that synthesizes alternate configurations using counterfactual analysis aligned with user goals and constraints. We introduce a composite trade-off score that ranks suggestions based on prediction uncertainty, consistency between feature-target relationships using causal models, and similarity between feature distributions against historical data. To our knowledge, WANDER is the first such system to unify prediction, exploration, and explanation for HPC tuning under a common query interface. Across multiple datasets WANDER generates interpretable and trustworthy, human-readable alternatives that guide users to achieve their performance objectives.