🤖 AI Summary
This work addresses the ethical and privacy concerns inherent in facial age estimation, which often relies on training data containing images of minors. To formalize ethical constraints as a generalized zero-shot learning problem, the authors construct a standardized evaluation benchmark that excludes any training samples from individuals under 18 years of age. They employ an identity-disjoint data split strategy with strict age-based partitioning, reconstructing training and test sets from six major datasets. The study evaluates nine state-of-the-art methods on their ability to generalize to unseen age groups, particularly minors. Results reveal a significant performance drop—averaging 46.4% (up to 52.8%)—on underage subjects across all methods, accompanied by a pronounced bias toward seen-age categories, thereby exposing a critical deficiency in current models’ generalization capacity under ethical constraints.
📝 Abstract
Age estimation from facial images typically relies on training data that includes images of minors, a practice that raises serious ethical, legal, and privacy concerns. In this work, we propose a generalized zero-shot benchmark for facial age estimation that explicitly excludes children's data during training while still assessing model performance on younger populations. We revisit six widely used datasets and introduce standardized splits with strict age-group separation: samples aged 18-59 for training, validation, and testing; samples under 18 reserved exclusively for zero-shot evaluation; and samples 60+ as an unseen validation set for model selection under distribution shift. For datasets with identity annotations, subject-exclusive splits prevent identity leakage and better reflect real-world deployment conditions. Evaluating nine state-of-the-art age estimation methods under this protocol reveals that all evaluated methods consistently fail to generalize to unseen age groups, suffering substantial performance degradation -- on average 46.4%, and up to 52.8% -- relative to the supervised baseline. Moreover, models do not simply degrade: they systematically anchor predictions for unseen ages to nearby seen classes, a manifestation of the well-known seen-class bias in generalized zero-shot learning. By formalizing age estimation without children's data as a generalized zero-shot benchmark on existing datasets, this work highlights a critical gap between current modeling practices and real-world ethical constraints. Our benchmark provides a principled basis for evaluating models under restricted data regimes and encourages the development of methods that are robust to distribution shift and aligned with responsible data use.