π€ AI Summary
This work addresses the limited applicability of deep learning models in high-stakes domains due to insufficient interpretability. It proposes the first extension of influence functions to the concept level by integrating them with Concept Bottleneck Models (CBMs), thereby enhancing interpretability at both sample and concept granularities in NLP. The approach identifies training samples and key concepts that most significantly impact predictions, elucidating their causal roles and enabling data debugging and behavioral intervention without model retraining. Experiments on the CEBaB and Yelp datasets demonstrate that merely adjusting the labels of influential samples or the weights of critical concepts effectively restores model performance, validating the methodβs practicality and efficacy.
π Abstract
In recent years, the black-box nature of deep learning models has limited their application in high-stakes domains such as medical diagnosis and finance, where interpretability is essential. To address this, we propose a novel approach using influence functions to enhance interpretability in NLP models at both the sample and concept levels. Experiments on CEBaB and Yelp datasets show that influence functions effectively identify the most impactful training samples, both helpful and harmful, on model predictions. By adjusting the labels and weights of these samples, we demonstrate that model performance can be restored to baseline levels without retraining, confirming the value of influence functions for efficient data debugging. Furthermore, our concept-level analysis identifies key concepts within Concept Bottleneck Models (CBM) that significantly affect predictions. Modifying these concepts alters model behavior observably, providing clear insights into the decision process.