Application of integrated gradients explainability to sociopsychological semantic markers

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing text classification models lack interpretability for socio-psychological semantic constructs—such as agency—that are non-affective and theoretically grounded in social psychology. Method: This work pioneers the systematic application of Integrated Gradients (IG) to word-level attribution analysis for such constructs. We propose an “encouraging overfitting” training strategy tailored to few-shot settings, enhancing BERTAgent’s sensitivity to fine-grained psychological markers; IG is then employed to visualize token-level feature contributions. Contribution/Results: Empirical validation from social psychology confirms the psychological validity of the attributions. Experiments demonstrate that the method accurately identifies semantically salient tokens strongly associated with agency—even under low-resource conditions—thereby substantially improving model transparency and domain-specific interpretability. This establishes a novel paradigm for explainable NLP in computational social science.

Technology Category

Application Category

📝 Abstract

Classification of textual data in terms of sentiment, or more nuanced sociopsychological markers (e.g., agency), is now a popular approach commonly applied at the sentence level. In this paper, we exploit the integrated gradient (IG) method to capture the classification output at the word level, revealing which words actually contribute to the classification process. This approach improves explainability and provides in-depth insights into the text. We focus on sociopsychological markers beyond sentiment and investigate how to effectively train IG in agency, one of the very few markers for which a verified deep learning classifier, BERTAgent, is currently available. Performance and system parameters are carefully tested, alternatives to the IG approach are evaluated, and the usefulness of the result is verified in a relevant application scenario. The method is also applied in a scenario where only a small labeled dataset is available, with the aim of exploiting IG to identify the salient words that contribute to building the different classes that relate to relevant sociopsychological markers. To achieve this, an uncommon training procedure that encourages overfitting is employed to enhance the distinctiveness of each class. The results are analyzed through the lens of social psychology, offering valuable insights.

Problem

Research questions and friction points this paper is trying to address.

Enhance explainability of sociopsychological semantic markers classification.

Apply integrated gradients to identify word-level contributions in text classification.

Train IG effectively for agency markers using limited labeled datasets.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated gradients for word-level classification insights

Overfitting training to enhance class distinctiveness

BERTAgent for sociopsychological marker classification

🔎 Similar Papers

No similar papers found.