Learning to Reject Low-Quality Explanations via User Feedback

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

In high-stakes domains such as credit scoring, the quality of model explanations varies significantly, undermining user trust and decision reliability. To address this, we propose LtX—a novel framework introducing the first *explanation quality rejection mechanism*, enabling classifiers to actively abstain from predicting on inputs for which high-quality explanations cannot be reliably generated. At its core, LtX employs ULER (Unified Explanation Quality Evaluator and Rejector), a lightweight module that jointly models explanation quality discrimination for both classification and regression tasks by integrating human-assigned explanation scores with feature attribution fidelity metrics. Evaluated across eight benchmark tasks and a newly released, human-annotated dataset, ULER consistently outperforms state-of-the-art methods, achieving substantial gains in low-quality explanation detection accuracy. This work establishes a new paradigm for trustworthy deployment of explainable AI systems by explicitly coupling predictive performance with explanation quality assurance.

Technology Category

Application Category

📝 Abstract

Machine Learning predictors are increasingly being employed in high-stakes applications such as credit scoring. Explanations help users unpack the reasons behind their predictions, but are not always "high quality''. That is, end-users may have difficulty interpreting or believing them, which can complicate trust assessment and downstream decision-making. We argue that classifiers should have the option to refuse handling inputs whose predictions cannot be explained properly and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the quality of explanations. In this problem setting, the key challenges are how to properly define and assess explanation quality and how to design a suitable rejector. Focusing on popular attribution techniques, we introduce ULER (User-centric Low-quality Explanation Rejector), which learns a simple rejector from human ratings and per-feature relevance judgments to mirror human judgments of explanation quality. Our experiments show that ULER outperforms both state-of-the-art and explanation-aware learning to reject strategies at LtX on eight classification and regression benchmarks and on a new human-annotated dataset, which we will publicly release to support future research.

Problem

Research questions and friction points this paper is trying to address.

Identifying low-quality explanations in ML predictions

Learning to reject unreliable explanations via user feedback

Assessing explanation quality for trust and decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for rejecting low-quality explanations

User-centric rejector learning from feedback

Outperforms state-of-the-art rejection strategies

🔎 Similar Papers

Why do explanations fail? A typology and discussion on failures in XAI

2024-05-22arXiv.orgCitations: 6

TikTok

$174304 - $259200 per year

San Jose, CA

Authors to Follow