🤖 AI Summary
This work proposes an efficient method for evaluating the intrinsic quality (IQ) of large-scale face recognition datasets without requiring full model training. By integrating neighborhood consistency scores with the effective rank of the embedding space, the approach establishes a lightweight, validation-free quality assessment framework capable of rapidly predicting downstream recognition performance using proxy models or dataset subsets. Experimental results demonstrate that the proposed IQ metric accurately forecasts model performance across clean, noisy, and mixed-quality datasets, substantially reducing the cost of data diagnosis and filtering. This provides a practical and scalable tool for preprocessing massive face datasets in real-world applications.
📝 Abstract
We propose Intrinsic Quality (IQ), a validation-free metric designed to estimate the inherent potential of face recognition (FR) datasets to produce high-performance models without the need for full-scale training. IQ integrates two components: (i) a Neighbor-Consistency Score that quantifies local identity label agreement via nearest neighbors, and (ii) Global Representation Subspace Complexity (Effective Rank, ER), which captures the underlying embedding geometry and dataset diversity. IQ allows for rapid evaluation using lightweight proxy models or data subsets, facilitating dataset diagnosis and curation prior to resource-intensive full-scale training. We describe an experimental protocol tailored to clean, noisy, and mixed-quality FR datasets, and outline evaluation methodologies to validate IQ's predictive power for downstream performance.