🤖 AI Summary
This study investigates whether purely synthetic eye images can replace real data for training high-performance, privacy-preserving sclera segmentation models. We propose a lightweight Transformer-based segmentation architecture integrated with a generative-model-guided mechanism and a multi-source synthetic data training strategy, evaluated systematically across three cross-domain test sets. Results show that the best model trained exclusively on synthetic data achieves an F1-score of 0.803—comparable to models incorporating real data. Crucially, methodological design choices (e.g., network architecture and training strategy) contribute more substantially to performance gains than the inclusion of real samples. To our knowledge, this is the first work to empirically validate the feasibility of fully synthetic-data-driven sclera segmentation. It establishes a novel paradigm for high-accuracy segmentation without requiring real biometric data, thereby enabling scalable, privacy-compliant solutions for biometric recognition in sensitive applications.
📝 Abstract
This paper presents a summary of the 2025 Sclera Segmentation Benchmarking Competition (SSBC), which focused on the development of privacy-preserving sclera-segmentation models trained using synthetically generated ocular images. The goal of the competition was to evaluate how well models trained on synthetic data perform in comparison to those trained on real-world datasets. The competition featured two tracks: $(i)$ one relying solely on synthetic data for model development, and $(ii)$ one combining/mixing synthetic with (a limited amount of) real-world data. A total of nine research groups submitted diverse segmentation models, employing a variety of architectural designs, including transformer-based solutions, lightweight models, and segmentation networks guided by generative frameworks. Experiments were conducted across three evaluation datasets containing both synthetic and real-world images, collected under diverse conditions. Results show that models trained entirely on synthetic data can achieve competitive performance, particularly when dedicated training strategies are employed, as evidenced by the top performing models that achieved $F_1$ scores of over $0.8$ in the synthetic data track. Moreover, performance gains in the mixed track were often driven more by methodological choices rather than by the inclusion of real data, highlighting the promise of synthetic data for privacy-aware biometric development. The code and data for the competition is available at: https://github.com/dariant/SSBC_2025.