🤖 AI Summary
This paper addresses the novel problem of how training samples influence model explanations—not merely predictions—by formally defining “sample influence on explanations” and proposing an explanation-centric influence assessment paradigm. Methodologically, it integrates data influence functions with gradient approximations to design an efficient algorithm that embeds group-sensitive attribute constraints into counterfactual explanation generation, thereby precisely identifying training samples that significantly affect explanation outcomes (e.g., counterfactual recourse cost). Its core contributions are: (1) establishing a novel framework for explanation attribution, enabling fairness-aware attribution analysis; and (2) empirically validating the approach on multiple fairness-sensitive datasets, successfully pinpointing data sources driving explanation bias, thereby substantially enhancing explanation transparency and debugging efficiency.
📝 Abstract
EXplainable AI (XAI) constitutes a popular method to analyze the reasoning of AI systems by explaining their decision-making, e.g. providing a counterfactual explanation of how to achieve recourse. However, in cases such as unexpected explanations, the user might be interested in learning about the cause of this explanation -- e.g. properties of the utilized training data that are responsible for the observed explanation. Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model. In this work, we take a slightly different stance, as we are interested in the influence of single samples on a model explanation rather than the model itself. Hence, we propose the novel problem of identifying training data samples that have a high influence on a given explanation (or related quantity) and investigate the particular case of differences in the cost of the recourse between protected groups. For this, we propose an algorithm that identifies such influential training samples.