🤖 AI Summary
In high-stakes domains such as credit scoring, the quality of model explanations varies significantly, undermining user trust and decision reliability. To address this, we propose LtX—a novel framework introducing the first *explanation quality rejection mechanism*, enabling classifiers to actively abstain from predicting on inputs for which high-quality explanations cannot be reliably generated. At its core, LtX employs ULER (Unified Explanation Quality Evaluator and Rejector), a lightweight module that jointly models explanation quality discrimination for both classification and regression tasks by integrating human-assigned explanation scores with feature attribution fidelity metrics. Evaluated across eight benchmark tasks and a newly released, human-annotated dataset, ULER consistently outperforms state-of-the-art methods, achieving substantial gains in low-quality explanation detection accuracy. This work establishes a new paradigm for trustworthy deployment of explainable AI systems by explicitly coupling predictive performance with explanation quality assurance.
📝 Abstract
Machine Learning predictors are increasingly being employed in high-stakes applications such as credit scoring. Explanations help users unpack the reasons behind their predictions, but are not always "high quality''. That is, end-users may have difficulty interpreting or believing them, which can complicate trust assessment and downstream decision-making. We argue that classifiers should have the option to refuse handling inputs whose predictions cannot be explained properly and introduce a framework for learning to reject low-quality explanations (LtX) in which predictors are equipped with a rejector that evaluates the quality of explanations. In this problem setting, the key challenges are how to properly define and assess explanation quality and how to design a suitable rejector. Focusing on popular attribution techniques, we introduce ULER (User-centric Low-quality Explanation Rejector), which learns a simple rejector from human ratings and per-feature relevance judgments to mirror human judgments of explanation quality. Our experiments show that ULER outperforms both state-of-the-art and explanation-aware learning to reject strategies at LtX on eight classification and regression benchmarks and on a new human-annotated dataset, which we will publicly release to support future research.