🤖 AI Summary
Existing synthetic face generation methods struggle to simultaneously preserve intra-class attribute diversity and inter-sample identity consistency, thereby limiting face recognition performance. To address this, we propose Vec2Face+, a novel generative framework that introduces AttrOP—a first-of-its-kind attribute optimization algorithm—and a LoRA-driven pose control mechanism, enabling direct high-fidelity face synthesis in vector space with fine-grained attribute modulation and strong identity preservation. Based on this framework, we construct the VFace10K/100K/300K benchmark datasets—the first synthetic face datasets whose overall recognition performance surpasses CASIA-WebFace. Evaluated on seven real-world benchmarks, Vec2Face+ achieves state-of-the-art results; notably, VFace300K significantly outperforms CASIA-WebFace across five metrics, empirically validating the synergistic improvement of inter-class separability, intra-class diversity, and identity consistency.
📝 Abstract
When synthesizing identities as face recognition training data, it is generally believed that large inter-class separability and intra-class attribute variation are essential for synthesizing a quality dataset. % This belief is generally correct, and this is what we aim for. However, when increasing intra-class variation, existing methods overlook the necessity of maintaining intra-class identity consistency. % To address this and generate high-quality face training data, we propose Vec2Face+, a generative model that creates images directly from image features and allows for continuous and easy control of face identities and attributes. Using Vec2Face+, we obtain datasets with proper inter-class separability and intra-class variation and identity consistency using three strategies: 1) we sample vectors sufficiently different from others to generate well-separated identities; 2) we propose an AttrOP algorithm for increasing general attribute variations; 3) we propose LoRA-based pose control for generating images with profile head poses, which is more efficient and identity-preserving than AttrOP. % Our system generates VFace10K, a synthetic face dataset with 10K identities, which allows an FR model to achieve state-of-the-art accuracy on seven real-world test sets. Scaling the size to 4M and 12M images, the corresponding VFace100K and VFace300K datasets yield higher accuracy than the real-world training dataset, CASIA-WebFace, on five real-world test sets. This is the first time a synthetic dataset beats the CASIA-WebFace in average accuracy. In addition, we find that only 1 out of 11 synthetic datasets outperforms random guessing (emph{i.e., 50%}) in twin verification and that models trained with synthetic identities are more biased than those trained with real identities. Both are important aspects for future investigation.