EvalxNLP: A Framework for Benchmarking Post-Hoc Explainability Methods on NLP Models

📅 2025-05-02

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Existing explainability methods for Transformer models lack a systematic, standardized evaluation framework. Method: This paper introduces the first XAI benchmark framework specifically designed for Transformers, supporting eight mainstream feature attribution techniques (e.g., Integrated Gradients, LIME, SHAP) and evaluating them along three core dimensions—faithfulness, plausibility, and complexity—via multi-metric quantitative assessment. It pioneers a dual-track paradigm integrating automated metrics with LLM-driven natural language explanations. The framework ensures cross-user scalability, reproducibility, and low entry barriers through standardized APIs, modular algorithm implementations, and human-in-the-loop evaluation protocols. Contribution/Results: Experiments demonstrate statistically significant improvements over baselines in both automated metrics and human evaluations, with high user satisfaction. This work advances NLP interpretability research toward systematization, comparability, and democratization.

Technology Category

Application Category

📝 Abstract

As Natural Language Processing (NLP) models continue to evolve and become integral to high-stakes applications, ensuring their interpretability remains a critical challenge. Given the growing variety of explainability methods and diverse stakeholder requirements, frameworks that help stakeholders select appropriate explanations tailored to their specific use cases are increasingly important. To address this need, we introduce EvalxNLP, a Python framework for benchmarking state-of-the-art feature attribution methods for transformer-based NLP models. EvalxNLP integrates eight widely recognized explainability techniques from the Explainable AI (XAI) literature, enabling users to generate and evaluate explanations based on key properties such as faithfulness, plausibility, and complexity. Our framework also provides interactive, LLM-based textual explanations, facilitating user understanding of the generated explanations and evaluation outcomes. Human evaluation results indicate high user satisfaction with EvalxNLP, suggesting it is a promising framework for benchmarking explanation methods across diverse user groups. By offering a user-friendly and extensible platform, EvalxNLP aims at democratizing explainability tools and supporting the systematic comparison and advancement of XAI techniques in NLP.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking explainability methods for NLP models

Evaluating explanations based on faithfulness and plausibility

Providing user-friendly tools for XAI in NLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

Python framework for benchmarking NLP explainability methods

Integrates eight XAI techniques with key evaluation metrics

Provides interactive LLM-based textual explanations for users

🔎 Similar Papers

No similar papers found.