🤖 AI Summary
This study addresses the challenges of online signature verification, particularly the high intra-class variability and extremely limited number of enrollment samples, which hinder reliable discrimination between skilled forgeries and genuine signatures. The work proposes a novel approach that first transforms temporal signature signals into six-channel asymmetric Gramian Angular Fields (GASF/GADF), enabling the use of 2D vision backbone networks. A dual-branch ConvNeXt-Tiny encoder is designed to separately capture co-occurrence and directional transition features, which are then fused via bidirectional cross-attention to integrate complementary temporal structural information. Verification is performed in a metric space using cosine similarity. The method significantly outperforms existing sequence-based baselines on both DeepSignDB and BiosecurID, demonstrating that the representational gains from 2D temporal encoding are both robust and training-agnostic.
📝 Abstract
Online signature verification (OSV) requires distinguishing skilled forgeries from genuine samples under high intra-class variability and with very few enrollment samples. Existing deep learning methods operate directly on raw temporal sequences, restricting them to 1D architectures and preventing the use of pretrained 2D vision backbones. We bridge this gap with GAFSV-Net, which represents each signature as a six-channel asymmetric Gramian Angular Field image: three kinematic channels (pen speed, pressure derivative, direction angle) are each encoded into complementary GASF and GADF matrices that capture pairwise temporal co-occurrence and directional transition structure respectively. A dual-branch ConvNeXt-Tiny encoder processes GASF and GADF independently, with bidirectional cross-attention enabling each branch to query discriminative patterns from the other before metric-space projection. Training uses semi-hard triplet loss with skilled-forgery hard-negative injection; verification is performed via cosine similarity against a small enrollment prototype. We evaluate on DeepSignDB and BiosecurID, outperforming all sequence-based baselines trained under identical objectives, demonstrating that the representational gain of 2D temporal encoding is consistent and independent of training procedure, with ablations characterising each design choice's contribution.