Switchable Token-Specific Codebook Quantization For Face Image Compression

📅 2025-10-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing global shared codebook approaches neglect intra-face semantic correlations and token-level semantic disparities, leading to suboptimal reconstruction quality and face recognition performance at ultra-low bitrates (e.g., 0.05 bpp). To address this, we propose a switchable token-specific codebook quantization framework: first, codebooks are learned independently per semantic category; then, each visual token is dynamically assigned its most suitable dedicated codebook, enabling fine-grained, low-distortion quantization. Our method is the first to jointly couple token-level codebook selection with category-aware grouping—reducing individual codebook size while enhancing representational diversity and quantization fidelity. Experiments demonstrate that reconstructed face images achieve a mean recognition accuracy of 93.51% at 0.05 bpp, significantly outperforming global codebook baselines. This work establishes a novel paradigm for codebook-driven face compression models.

Technology Category

Application Category

📝 Abstract
With the ever-increasing volume of visual data, the efficient and lossless transmission, along with its subsequent interpretation and understanding, has become a critical bottleneck in modern information systems. The emerged codebook-based solution utilize a globally shared codebook to quantize and dequantize each token, controlling the bpp by adjusting the number of tokens or the codebook size. However, for facial images, which are rich in attributes, such global codebook strategies overlook both the category-specific correlations within images and the semantic differences among tokens, resulting in suboptimal performance, especially at low bpp. Motivated by these observations, we propose a Switchable Token-Specific Codebook Quantization for face image compression, which learns distinct codebook groups for different image categories and assigns an independent codebook to each token. By recording the codebook group to which each token belongs with a small number of bits, our method can reduce the loss incurred when decreasing the size of each codebook group. This enables a larger total number of codebooks under a lower overall bpp, thereby enhancing the expressive capability and improving reconstruction performance. Owing to its generalizable design, our method can be integrated into any existing codebook-based representation learning approach and has demonstrated its effectiveness on face recognition datasets, achieving an average accuracy of 93.51% for reconstructed images at 0.05 bpp.
Problem

Research questions and friction points this paper is trying to address.

Improves face image compression at low bit rates
Addresses limitations of global codebook quantization methods
Enhances reconstruction quality through token-specific codebook groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns distinct codebook groups for image categories
Assigns independent codebook to each token
Records token-codebook mapping with minimal bits
🔎 Similar Papers
No similar papers found.
Y
Yongbo Wang
East China Normal University, Shanghai, China
H
Haonan Wang
Tencent Youtu Lab, Shanghai, China
G
Guodong Mu
Tencent Youtu Lab, Shanghai, China
Ruixin Zhang
Ruixin Zhang
tencent
computer vision
J
Jiaqi Chen
East China Normal University, Shanghai, China
Jingyun Zhang
Jingyun Zhang
PhD student, Beihang University
J
Jun Wang
Tencent WeChat Pay Lab, Shenzhen, China
Y
Yuan Xie
East China Normal University, Shanghai, China
Zhizhong Zhang
Zhizhong Zhang
Associate Researcher, East China Normal University
Computer Vision
S
Shouhong Ding
Tencent Youtu Lab, Shanghai, China