Exploring Vision-Language Models for Online Signature Verification: A Zero-Shot Capability Study

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

This work presents the first exploration of zero-shot applicability of vision-language models (VLMs) to online signature verification. By transforming temporal signature data into static images enriched with pressure information, the study leverages state-of-the-art VLMs such as GPT-5.2 and Gemini 2.5 Pro for zero-shot verification and introduces a biometric scoring mechanism based on token probabilities. Experimental results demonstrate that the proposed approach achieves an equal error rate as low as 0.32% under random forgery scenarios on mobile devices, outperforming current supervised state-of-the-art methods. However, performance degrades significantly in skilled forgery tasks, and chain-of-thought (CoT) reasoning induces a “rationalization trap,” adversely affecting verification accuracy. This study thus reveals both the promising potential and inherent limitations of VLMs in high-precision biometric recognition.

📝 Abstract

Recent advancements in Vision-Language Models (VLMs) have demonstrated strong capabilities in general visual reasoning, yet their applicability to rigorous biometric tasks remains unexplored. This work presents an exploratory study evaluating the zero-shot performance of state-of-the-art VLMs (GPT-5.2 and Gemini 2.5 Pro) on the Signature Verification Challenge (SVC) benchmark. To enable visual processing, raw kinematic time-series are converted into static images, encoding pressure information into stroke opacity whenever available in the source data. Furthermore, we introduce a scoring protocol that extracts latent token probabilities to compute robust biometric scores. Experimental results reveal a significant performance dichotomy dependent on signal quality and forgery type. In random forgery scenarios, the zero-shot VLM achieves exceptional discrimination, with GPT-5.2 reaching an Equal Error Rate of 0.32% in mobile tasks, outperforming supervised state-of-the-art systems. Conversely, in skilled forgery scenarios, where the task is more challenging because both signatures are almost identical, the results are significantly worse, and a critical "Rationalization Trap" emerges: chain-of-thought (CoT) reasoning degrades performance as the model produces kinematic hallucinations to justify forgery artifacts as natural variability.

Problem

Research questions and friction points this paper is trying to address.

Vision-Language Models

Online Signature Verification

Zero-Shot Learning

Biometric Authentication

Forgery Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Vision-Language Models

Zero-Shot Learning

Online Signature Verification