🤖 AI Summary
This work addresses the theoretical applicability of Shapley value-based feature attribution in large language model (LLM) decision support systems, specifically under the inherent stochasticity of LLM reasoning. Method: We conduct the first systematic analysis of Shapley values’ theoretical guarantees in non-deterministic inference settings, revealing that standard Shapley axioms cannot be strictly satisfied. To reconcile this, we propose a principled reformulation of the utility function and sampling strategy, enabling controllable trade-offs among approximate consistency, computational efficiency, and explanation fidelity. We introduce three Shapley variants, each with formally characterized applicability conditions and explicit impact on the efficiency–principled-consistency–interpretability triad. Results: Empirical evaluation across medical and financial decision-making tasks validates the framework’s effectiveness. Our work establishes the first theoretically grounded, randomness-aware Shapley analysis framework for LLM interpretability, accompanied by practical implementation guidelines.
📝 Abstract
Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value attributions, and principle attainment.