Interpretable Failure Detection with Human-Level Concepts

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural network failure detection remains unreliable in safety-critical applications—particularly due to overconfidence under misclassification and the lack of interpretability in existing logit-based confidence estimation methods. Method: This paper proposes the first dual-objective framework grounded in human-understandable visual concepts. It models concept activation, introduces an ordinal ranking mechanism, and fuses multi-source signals to generate fine-grained, interpretable confidence scores—enabling transparent failure attribution without modifying the base model architecture. Contribution/Results: The method achieves significant improvements via post-hoc processing: false positive rates decrease by 3.7% on ImageNet and 9.0% on EuroSAT. It simultaneously ensures high detection reliability, strong interpretability, and deployment efficiency—bridging critical gaps between robustness, transparency, and practicality in real-world vision systems.

Technology Category

Application Category

📝 Abstract
Reliable failure detection holds paramount importance in safety-critical applications. Yet, neural networks are known to produce overconfident predictions for misclassified samples. As a result, it remains a problematic matter as existing confidence score functions rely on category-level signals, the logits, to detect failures. This research introduces an innovative strategy, leveraging human-level concepts for a dual purpose: to reliably detect when a model fails and to transparently interpret why. By integrating a nuanced array of signals for each category, our method enables a finer-grained assessment of the model's confidence. We present a simple yet highly effective approach based on the ordinal ranking of concept activation to the input image. Without bells and whistles, our method significantly reduce the false positive rate across diverse real-world image classification benchmarks, specifically by 3.7% on ImageNet and 9% on EuroSAT.
Problem

Research questions and friction points this paper is trying to address.

Interpret neural network failures
Enhance failure detection reliability
Use human-level concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

human-level concept integration
ordinal ranking activation
reduced false positive rate
🔎 Similar Papers
No similar papers found.
Kien X. Nguyen
Kien X. Nguyen
University of Delaware
Machine Learning
T
Tang Li
Department of Computer and Information Sciences, University of Delaware, Newark, DE, USA
X
Xi Peng
Department of Computer and Information Sciences, University of Delaware, Newark, DE, USA