🤖 AI Summary
This work addresses the challenge of face super-resolution under severe degradation, where recovering fine structural details and identity-preserving features remains difficult. To this end, we propose SwinIFS, a novel framework that, for the first time, incorporates dense Gaussian keypoint heatmaps as structural priors to guide a lightweight Swin Transformer toward semantically critical regions directly at the input stage. By integrating a hierarchical multi-scale attention mechanism, our method achieves identity-consistent reconstruction at medium to high upscaling factors (up to 8×). Extensive experiments on CelebA demonstrate that SwinIFS significantly outperforms existing approaches, producing sharper, more photorealistic images while effectively preserving facial structure even under extreme magnification, thus achieving both high reconstruction quality and computational efficiency.
📝 Abstract
Face super-resolution aims to recover high-quality facial images from severely degraded low-resolution inputs, but remains challenging due to the loss of fine structural details and identity-specific features. This work introduces SwinIFS, a landmark-guided super-resolution framework that integrates structural priors with hierarchical attention mechanisms to achieve identity-preserving reconstruction at both moderate and extreme upscaling factors. The method incorporates dense Gaussian heatmaps of key facial landmarks into the input representation, enabling the network to focus on semantically important facial regions from the earliest stages of processing. A compact Swin Transformer backbone is employed to capture long-range contextual information while preserving local geometry, allowing the model to restore subtle facial textures and maintain global structural consistency. Extensive experiments on the CelebA benchmark demonstrate that SwinIFS achieves superior perceptual quality, sharper reconstructions, and improved identity retention; it consistently produces more photorealistic results and exhibits strong performance even under 8x magnification, where most methods fail to recover meaningful structure. SwinIFS also provides an advantageous balance between reconstruction accuracy and computational efficiency, making it suitable for real-world applications in facial enhancement, surveillance, and digital restoration. Our code, model weights, and results are available at https://github.com/Habiba123-stack/SwinIFS.