Analyzing the Influence of Training Samples on Explanations

📅 2024-06-05

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This paper addresses the novel problem of how training samples influence model explanations—not merely predictions—by formally defining “sample influence on explanations” and proposing an explanation-centric influence assessment paradigm. Methodologically, it integrates data influence functions with gradient approximations to design an efficient algorithm that embeds group-sensitive attribute constraints into counterfactual explanation generation, thereby precisely identifying training samples that significantly affect explanation outcomes (e.g., counterfactual recourse cost). Its core contributions are: (1) establishing a novel framework for explanation attribution, enabling fairness-aware attribution analysis; and (2) empirically validating the approach on multiple fairness-sensitive datasets, successfully pinpointing data sources driving explanation bias, thereby substantially enhancing explanation transparency and debugging efficiency.

Technology Category

Application Category

📝 Abstract

EXplainable AI (XAI) constitutes a popular method to analyze the reasoning of AI systems by explaining their decision-making, e.g. providing a counterfactual explanation of how to achieve recourse. However, in cases such as unexpected explanations, the user might be interested in learning about the cause of this explanation -- e.g. properties of the utilized training data that are responsible for the observed explanation. Under the umbrella of data valuation, first approaches have been proposed that estimate the influence of data samples on a given model. In this work, we take a slightly different stance, as we are interested in the influence of single samples on a model explanation rather than the model itself. Hence, we propose the novel problem of identifying training data samples that have a high influence on a given explanation (or related quantity) and investigate the particular case of differences in the cost of the recourse between protected groups. For this, we propose an algorithm that identifies such influential training samples.

Problem

Research questions and friction points this paper is trying to address.

Understanding how training data influences AI explanations

Identifying training samples affecting model's internal reasoning

Measuring impact of data on computational recourse costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data valuation to analyze training samples' influence

Algorithm identifies samples shaping model explanations

Focus on internal reasoning over predictive performance

🔎 Similar Papers

FaithLM: Towards Faithful Explanations for Large Language Models