🤖 AI Summary
To address the opacity of decision-making in face presentation attack detection (PAD) models, this paper proposes Ensemble-CAM—a model-agnostic interpretability method that aggregates multiple class activation mapping (CAM) techniques to generate robust, high-fidelity saliency maps highlighting critical discriminative regions underlying liveness/fake decisions. Unlike gradient-based or architecture-specific approaches, Ensemble-CAM requires no modification to the original PAD network and is compatible with mainstream deep learning-based PAD models. Experimental results across multiple benchmark datasets demonstrate that Ensemble-CAM yields more stable and spatially accurate explanations compared to individual CAM variants, while preserving near-original detection performance (AUC degradation < 0.3%). By enhancing decision transparency and user trust without compromising accuracy, Ensemble-CAM provides a practical, deployable solution for building reliable, interpretable face recognition systems in high-security applications.
📝 Abstract
Presentation attacks represent a critical security threat where adversaries use fake biometric data, such as face, fingerprint, or iris images, to gain unauthorized access to protected systems. Various presentation attack detection (PAD) systems have been designed leveraging deep learning (DL) models to mitigate this type of threat. Despite their effectiveness, most of the DL models function as black boxes - their decisions are opaque to their users. The purpose of explainability techniques is to provide detailed information about the reason behind the behavior or decision of DL models. In particular, visual explanation is necessary to better understand the decisions or predictions of DL-based PAD systems and determine the key regions due to which a biometric image is considered real or fake by the system. In this work, a novel technique, Ensemble-CAM, is proposed for providing visual explanations for the decisions made by deep learning-based face PAD systems. Our goal is to improve DL-based face PAD systems by providing a better understanding of their behavior. Our provided visual explanations will enhance the transparency and trustworthiness of DL-based face PAD systems.