FingerVeinSyn-5M: A Million-Scale Dataset and Benchmark for Finger Vein Recognition

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Finger-vein recognition has long been hindered by the scarcity of large-scale, highly diverse public datasets. To address this, we introduce FingerVeinSyn-5M—the first million-scale synthetic finger-vein dataset—comprising 5 million samples from 50,000 unique fingers and incorporating 100 realistic imaging degradations (e.g., motion/optical blur, exposure variations). Our core contribution is FVeinSyn, a high-fidelity synthetic engine that achieves fine-grained decoupling of physiological anatomy and imaging physics for the first time, enabling anatomically guided texture generation and controllable, multi-factor degradation simulation. FingerVeinSyn-5M is fully annotated, large-scale, and multi-degradation benchmark. Models pretrained on it require only minimal real-data fine-tuning and achieve an average 53.91% performance gain across multiple benchmarks. The dataset is publicly released to advance the practical deployment of deep learning–based finger-vein recognition.

Technology Category

Application Category

📝 Abstract
A major challenge in finger vein recognition is the lack of large-scale public datasets. Existing datasets contain few identities and limited samples per finger, restricting the advancement of deep learning-based methods. To address this, we introduce FVeinSyn, a synthetic generator capable of producing diverse finger vein patterns with rich intra-class variations. Using FVeinSyn, we created FingerVeinSyn-5M -- the largest available finger vein dataset -- containing 5 million samples from 50,000 unique fingers, each with 100 variations including shift, rotation, scale, roll, varying exposure levels, skin scattering blur, optical blur, and motion blur. FingerVeinSyn-5M is also the first to offer fully annotated finger vein images, supporting deep learning applications in this field. Models pretrained on FingerVeinSyn-5M and fine-tuned with minimal real data achieve an average 53.91% performance gain across multiple benchmarks. The dataset is publicly available at: https://github.com/EvanWang98/FingerVeinSyn-5M.
Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale public datasets for finger vein recognition
Limited samples and identities in existing datasets hinder deep learning
Need for synthetic data to enhance recognition model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic generator for diverse vein patterns
Largest dataset with 5M annotated samples
Pretraining boosts performance by 53.91%
🔎 Similar Papers
No similar papers found.