🤖 AI Summary
Restoring faces in vintage photographs faces compound degradations—including structural discontinuities, severe color fading, and motion/defocus blur—posing challenges for existing diffusion models to jointly reconstruct local geometry and restore natural chromaticity. To address this, we propose Self-Supervised Selective Guidance Diffusion (SSGD), a novel diffusion-based framework that leverages weakly guided generation of pseudo-reference faces for staged optimization of structure and color. SSGD incorporates facial parsing maps and scratch masks for region-aware guidance and employs a two-stage supervision strategy: structure-prioritized denoising followed by color-refinement. To enable rigorous evaluation, we introduce VintageFace—the first large-scale, real-world vintage photo face dataset. Extensive experiments demonstrate that SSGD significantly outperforms state-of-the-art GAN- and diffusion-based baselines in perceptual quality, identity preservation, and local controllability, achieving new SOTA performance on the VintageFace benchmark.
📝 Abstract
Old-photo face restoration poses significant challenges due to compounded degradations such as breakage, fading, and severe blur. Existing pre-trained diffusion-guided methods either rely on explicit degradation priors or global statistical guidance, which struggle with localized artifacts or face color. We propose Self-Supervised Selective-Guided Diffusion (SSDiff), which leverages pseudo-reference faces generated by a pre-trained diffusion model under weak guidance. These pseudo-labels exhibit structurally aligned contours and natural colors, enabling region-specific restoration via staged supervision: structural guidance applied throughout the denoising process and color refinement in later steps, aligned with the coarse-to-fine nature of diffusion. By incorporating face parsing maps and scratch masks, our method selectively restores breakage regions while avoiding identity mismatch. We further construct VintageFace, a 300-image benchmark of real old face photos with varying degradation levels. SSDiff outperforms existing GAN-based and diffusion-based methods in perceptual quality, fidelity, and regional controllability. Code link: https://github.com/PRIS-CV/SSDiff.