Towards Facial Image Compression with Consistency Preserving Diffusion Prior

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

Existing low-bitrate face image compression methods suffer from poor reconstruction quality, loss of high-frequency details, and degraded performance on downstream tasks (e.g., face recognition). To address these issues, we propose FaSDiff, a frequency-domain consistent compression framework leveraging Stable Diffusion priors. FaSDiff employs a frequency-aware compressor to decouple low- and high-frequency components, and integrates a hybrid low-frequency enhancement module with a frequency-domain modulation mechanism to jointly optimize perceptual fidelity and machine-readable semantic consistency. The method is trained end-to-end without post-processing. Extensive experiments demonstrate that FaSDiff significantly outperforms state-of-the-art approaches across multiple benchmarks: at ultra-low bitrates (0.1–0.5 bpp), it achieves PSNR/SSIM gains of 1.2–2.8 dB and improves face recognition accuracy by 3.5–7.1%. To our knowledge, FaSDiff is the first method to achieve a unified balance between visual quality and semantic usability in low-bitrate face compression.

Technology Category

Application Category

📝 Abstract

With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reconstructed images that perform poorly in downstream applications due to insufficient preservation of high-frequency information. To further explore the diffusion prior in facial image compression, we propose Facial Image Compression with a Stable Diffusion Prior (FaSDiff), a method that preserves consistency through frequency enhancement. FaSDiff employs a high-frequency-sensitive compressor in an end-to-end framework to capture fine image details and produce robust visual prompts. Additionally, we introduce a hybrid low-frequency enhancement module that disentangles low-frequency facial semantics and stably modulates the diffusion prior alongside visual prompts. The proposed modules allow FaSDiff to leverage diffusion priors for superior human visual perception while minimizing performance loss in machine vision due to semantic inconsistency. Extensive experiments show that FaSDiff outperforms state-of-the-art methods in balancing human visual quality and machine vision accuracy. The code will be released after the paper is accepted.

Problem

Research questions and friction points this paper is trying to address.

Improving facial image compression quality at low bit rates

Preserving high-frequency details in diffusion-based compression

Balancing human visual perception and machine vision accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-frequency-sensitive compressor captures fine details

Hybrid low-frequency enhancement module disentangles semantics

Stable diffusion prior balances visual and machine performance

🔎 Similar Papers

PSC: Posterior Sampling-Based Compression