VerLM: Explaining Face Verification Using Natural Language

📅 2026-01-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of decision transparency in existing face verification systems, which struggle to provide trustworthy explanations. The authors propose a novel vision-language model capable of simultaneously determining whether two facial images belong to the same identity and generating natural language explanations—either concise or detailed. The approach innovatively introduces cross-modal transfer learning into the explainable face verification task, integrates complementary explanation styles, and combines state-of-the-art vision-language architectures with a visual reasoning mechanism inspired by audio difference modeling. Experimental results demonstrate that the proposed model significantly outperforms current baselines in both verification accuracy and explanation quality, highlighting the substantial potential of vision-language models for interpretable face verification.

Technology Category

Application Category

📝 Abstract
Face verification systems have seen substantial advancements; however, they often lack transparency in their decision-making processes. In this paper, we introduce an innovative Vision-Language Model (VLM) for Face Verification, which not only accurately determines if two face images depict the same individual but also explicitly explains the rationale behind its decisions. Our model is uniquely trained using two complementary explanation styles: (1) concise explanations that summarize the key factors influencing its decision, and (2) comprehensive explanations detailing the specific differences observed between the images. We adapt and enhance a state-of-the-art modeling approach originally designed for audio-based differentiation to suit visual inputs effectively. This cross-modal transfer significantly improves our model's accuracy and interpretability. The proposed VLM integrates sophisticated feature extraction techniques with advanced reasoning capabilities, enabling clear articulation of its verification process. Our approach demonstrates superior performance, surpassing baseline methods and existing models. These findings highlight the immense potential of vision language models in face verification set up, contributing to more transparent, reliable, and explainable face verification systems.
Problem

Research questions and friction points this paper is trying to address.

face verification
explainability
transparency
vision-language model
decision rationale
Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Model
Explainable AI
Face Verification
Cross-modal Transfer
Natural Language Explanation
🔎 Similar Papers
No similar papers found.
S
Syed Abdul Hannan
Carnegie Mellon University, Pittsburgh, USA
H
Hazim T. Bukhari
Carnegie Mellon University, Pittsburgh, USA
T
Thomas Cantalapiedra
Carnegie Mellon University, Pittsburgh, USA
E
Eman Ansar
Carnegie Mellon University, Pittsburgh, USA
Massa Baali
Massa Baali
Carnegie Mellon University
Speech and Audio ProcessingDeep Learning
R
Rita Singh
Carnegie Mellon University, Pittsburgh, USA
Bhiksha Raj
Bhiksha Raj
Carnegie Mellon University
Deep LearningArtificial IntelligenceSpeech and Audio ProcessingSignal ProcessingMachine Learning