llmSHAP: A Principled Approach to LLM Explainability

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the theoretical applicability of Shapley value-based feature attribution in large language model (LLM) decision support systems, specifically under the inherent stochasticity of LLM reasoning. Method: We conduct the first systematic analysis of Shapley values’ theoretical guarantees in non-deterministic inference settings, revealing that standard Shapley axioms cannot be strictly satisfied. To reconcile this, we propose a principled reformulation of the utility function and sampling strategy, enabling controllable trade-offs among approximate consistency, computational efficiency, and explanation fidelity. We introduce three Shapley variants, each with formally characterized applicability conditions and explicit impact on the efficiency–principled-consistency–interpretability triad. Results: Empirical evaluation across medical and financial decision-making tasks validates the framework’s effectiveness. Our work establishes the first theoretically grounded, randomness-aware Shapley analysis framework for LLM interpretability, accompanied by practical implementation guidelines.

Technology Category

Application Category

📝 Abstract

Feature attribution methods help make machine learning-based inference explainable by determining how much one or several features have contributed to a model's output. A particularly popular attribution method is based on the Shapley value from cooperative game theory, a measure that guarantees the satisfaction of several desirable principles, assuming deterministic inference. We apply the Shapley value to feature attribution in large language model (LLM)-based decision support systems, where inference is, by design, stochastic (non-deterministic). We then demonstrate when we can and cannot guarantee Shapley value principle satisfaction across different implementation variants applied to LLM-based decision support, and analyze how the stochastic nature of LLMs affects these guarantees. We also highlight trade-offs between explainable inference speed, agreement with exact Shapley value attributions, and principle attainment.

Problem

Research questions and friction points this paper is trying to address.

Applying Shapley values to explain stochastic LLM outputs

Analyzing principle guarantees in LLM-based decision support

Evaluating trade-offs between speed, accuracy and explainability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Applies Shapley value to stochastic LLM decision support

Analyzes principle guarantees across different implementation variants

Evaluates trade-offs between speed, accuracy, and principle attainment

🔎 Similar Papers

Evaluating the Reliability of Self-Explanations in Large Language Models