When Can We Trust Deep Neural Networks? Towards Reliable Industrial Deployment with an Interpretability Guide

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the challenge of deep neural networks failing to proactively identify erroneous predictions—such as false negatives—in industrial defect detection, which compromises reliability in safety-critical applications. The authors propose a novel post-hoc explanation–based reliability metric that quantifies the difference in Intersection over Union (IoU) between class-specific and class-agnostic discriminative saliency maps as a reliability score. To amplify this discrepancy and enhance error detection, adversarial augmentation is introduced. This approach enables the first active identification of potentially erroneous model outputs in binary classification tasks, establishing a new “data–model–explanation–output” paradigm that significantly improves the trustworthiness of deployed AI systems. Evaluated on two industrial defect detection benchmarks, the method achieves 100% recall for false negatives when combined with adversarial augmentation.

Technology Category

Application Category

📝 Abstract

The deployment of AI systems in safety-critical domains, such as industrial defect inspection, autonomous driving, and medical diagnosis, is severely hampered by their lack of reliability. A single undetected erroneous prediction can lead to catastrophic outcomes. Unfortunately, there is often no alternative but to place trust in the outputs of a trained AI system, which operates without an internal safeguard to flag unreliable predictions, even in cases of high accuracy. We propose a post-hoc explanation-based indicator to detect false negatives in binary defect detection networks. To our knowledge, this is the first method to proactively identify potentially erroneous network outputs. Our core idea leverages the difference between class-specific discriminative heatmaps and class-agnostic ones. We compute the difference in their intersection over union (IoU) as a reliability score. An adversarial enhancement method is further introduced to amplify this disparity. Evaluations on two industrial defect detection benchmarks show our method effectively identifies false negatives. With adversarial enhancement, it achieves 100\% recall, albeit with a trade-off for true negatives. Our work thus advocates for a new and trustworthy deployment paradigm: data-model-explanation-output, moving beyond conventional end-to-end systems to provide critical support for reliable AI in real-world applications.

Problem

Research questions and friction points this paper is trying to address.

reliability

false negatives

deep neural networks

safety-critical applications

trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

post-hoc interpretability

reliability assessment

false negative detection