🤖 AI Summary
Existing methods for lifelong face age transformation struggle to balance age accuracy and identity preservation (the Age-ID trade-off), particularly under large age spans and extreme poses, leading to severe artifacts. This paper proposes Cradle2Cane, a two-stage diffusion framework: Stage I introduces Adaptive Noise Injection (AdaNI) to enhance age controllability; Stage II jointly integrates SVR-ArcFace and Rotate-CLIP—two complementary identity embeddings—into a few-step text-to-image diffusion model for fine-grained, co-optimized age-identity modeling. End-to-end joint training ensures photorealistic aging effects while strongly preserving identity. Evaluated on CelebA-HQ, our method achieves significant improvements over state-of-the-art approaches in both age prediction accuracy (measured by Face++) and identity similarity (assessed by Qwen-VL), demonstrating superior performance in holistic age transformation with robust identity fidelity.
📝 Abstract
Face aging has become a crucial task in computer vision, with applications ranging from entertainment to healthcare. However, existing methods struggle with achieving a realistic and seamless transformation across the entire lifespan, especially when handling large age gaps or extreme head poses. The core challenge lies in balancing age accuracy and identity preservation--what we refer to as the Age-ID trade-off. Most prior methods either prioritize age transformation at the expense of identity consistency or vice versa. In this work, we address this issue by proposing a two-pass face aging framework, named Cradle2Cane, based on few-step text-to-image (T2I) diffusion models. The first pass focuses on solving age accuracy by introducing an adaptive noise injection (AdaNI) mechanism. This mechanism is guided by including prompt descriptions of age and gender for the given person as the textual condition. Also, by adjusting the noise level, we can control the strength of aging while allowing more flexibility in transforming the face. However, identity preservation is weakly ensured here to facilitate stronger age transformations. In the second pass, we enhance identity preservation while maintaining age-specific features by conditioning the model on two identity-aware embeddings (IDEmb): SVR-ArcFace and Rotate-CLIP. This pass allows for denoising the transformed image from the first pass, ensuring stronger identity preservation without compromising the aging accuracy. Both passes are jointly trained in an end-to-end way. Extensive experiments on the CelebA-HQ test dataset, evaluated through Face++ and Qwen-VL protocols, show that our Cradle2Cane outperforms existing face aging methods in age accuracy and identity consistency.