🤖 AI Summary
Existing feature attribution methods for sequential deep learning models fail to disentangle the independent effects of *feature values* and *feature positions*, limiting predictive interpretability. To address this, we propose OrdShap—the first game-theoretic attribution method explicitly decoupling these two contributions. OrdShap systematically permutes feature positions in sequence space, extends Shapley values to quantify positional sensitivity, and establishes a theoretical connection to Sanchez-Bergantiños values. It is model-agnostic, requiring no architectural assumptions and applicable to arbitrary black-box sequential models. Experiments on clinical time-series, NLP, and synthetic benchmarks demonstrate that OrdShap accurately isolates the importance of feature values from that of their positions, substantially enhancing insight into model sensitivity to ordering. By enabling fine-grained analysis of sequential dependencies, OrdShap introduces a novel paradigm for interpretable sequence modeling. (149 words)
📝 Abstract
Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.