Towards Facial Image Compression with Consistency Preserving Diffusion Prior

📅 2025-05-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing low-bitrate face image compression methods suffer from poor reconstruction quality, loss of high-frequency details, and degraded performance on downstream tasks (e.g., face recognition). To address these issues, we propose FaSDiff, a frequency-domain consistent compression framework leveraging Stable Diffusion priors. FaSDiff employs a frequency-aware compressor to decouple low- and high-frequency components, and integrates a hybrid low-frequency enhancement module with a frequency-domain modulation mechanism to jointly optimize perceptual fidelity and machine-readable semantic consistency. The method is trained end-to-end without post-processing. Extensive experiments demonstrate that FaSDiff significantly outperforms state-of-the-art approaches across multiple benchmarks: at ultra-low bitrates (0.1–0.5 bpp), it achieves PSNR/SSIM gains of 1.2–2.8 dB and improves face recognition accuracy by 3.5–7.1%. To our knowledge, FaSDiff is the first method to achieve a unified balance between visual quality and semantic usability in low-bitrate face compression.

Technology Category

Application Category

📝 Abstract
With the widespread application of facial image data across various domains, the efficient storage and transmission of facial images has garnered significant attention. However, the existing learned face image compression methods often produce unsatisfactory reconstructed image quality at low bit rates. Simply adapting diffusion-based compression methods to facial compression tasks results in reconstructed images that perform poorly in downstream applications due to insufficient preservation of high-frequency information. To further explore the diffusion prior in facial image compression, we propose Facial Image Compression with a Stable Diffusion Prior (FaSDiff), a method that preserves consistency through frequency enhancement. FaSDiff employs a high-frequency-sensitive compressor in an end-to-end framework to capture fine image details and produce robust visual prompts. Additionally, we introduce a hybrid low-frequency enhancement module that disentangles low-frequency facial semantics and stably modulates the diffusion prior alongside visual prompts. The proposed modules allow FaSDiff to leverage diffusion priors for superior human visual perception while minimizing performance loss in machine vision due to semantic inconsistency. Extensive experiments show that FaSDiff outperforms state-of-the-art methods in balancing human visual quality and machine vision accuracy. The code will be released after the paper is accepted.
Problem

Research questions and friction points this paper is trying to address.

Improving facial image compression quality at low bit rates
Preserving high-frequency details in diffusion-based compression
Balancing human visual perception and machine vision accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-frequency-sensitive compressor captures fine details
Hybrid low-frequency enhancement module disentangles semantics
Stable diffusion prior balances visual and machine performance
🔎 Similar Papers
No similar papers found.
Y
Yimin Zhou
Tsinghua Shenzhen International Graduate School
Y
Yichong Xia
Tsinghua Shenzhen International Graduate School, Peng Cheng Laboratory
B
Bin Chen
Harbin Institute of Technology, Shenzhen, Peng Cheng Laboratory
B
Baoyi An
Huawei Technologies Company Ltd.
H
Haoqian Wang
Peng Cheng Laboratory
Z
Zhi Wang
Tsinghua Shenzhen International Graduate School
Yaowei Wang
Yaowei Wang
The Hong Kong Polytechnic University
Zikun Zhou
Zikun Zhou
Unknown affiliation
machine learningdeep learningvisual trackingdetection