🤖 AI Summary
To address the weak generalization capability of single models and performance degradation in cross-distance iris recognition, this paper systematically investigates the complementarity and interpretability of SqueezeNet, MobileNetV2, and ResNet50 for periocular verification. Leveraging LIME-based visualization and Jensen–Shannon divergence, we quantify inter-model differences in attention distributions over local image regions—including eyelashes, scleral texture, and pupil boundaries. Building on these insights, we propose a dual-metric feature representation combining cosine similarity and chi-square distance, followed by logistic regression for score-level fusion. Evaluated on the UBIPr dataset, the three-network ensemble achieves state-of-the-art performance, improving cross-distance recognition accuracy by 4.2% over prior methods. This work is the first to explicitly model CNN complementarity from an interpretability perspective and empirically validate its tangible contribution to robust iris recognition.
📝 Abstract
We study the complementarity of different CNNs for periocular verification at different distances on the UBIPr database. We train three architectures of increasing complexity (SqueezeNet, MobileNetv2, and ResNet50) on a large set of eye crops from VGGFace2. We analyse performance with cosine and chi2 metrics, compare different network initialisations, and apply score-level fusion via logistic regression. In addition, we use LIME heatmaps and Jensen-Shannon divergence to compare attention patterns of the CNNs. While ResNet50 consistently performs best individually, the fusion provides substantial gains, especially when combining all three networks. Heatmaps show that networks usually focus on distinct regions of a given image, which explains their complementarity. Our method significantly outperforms previous works on UBIPr, achieving a new state-of-the-art.