Dual Codebook VQ: Enhanced Image Reconstruction with Reduced Codebook Size

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vector quantization (VQ)-based image reconstruction has long suffered from low codebook utilization, resulting in strong coupling between reconstruction fidelity and codebook size. To address this, we propose a global-local dual-codebook co-design mechanism: a lightweight Transformer dynamically updates the global codebook to model semantic consistency, while a deterministic feature selection strategy constructs a local codebook to capture fine-grained structural details; both are jointly optimized end-to-end. Our method requires no pretraining and achieves, for the first time, efficient VQ reconstruction trained from scratch. At a modest codebook size of 512, it achieves significantly lower FID than state-of-the-art methods using thousand-level codebooks—particularly excelling in face and complex-scene reconstruction. Moreover, it reduces computational overhead by over 40%, achieving an unprecedented balance between high fidelity and high efficiency.

Technology Category

Application Category

📝 Abstract
Vector Quantization (VQ) techniques face significant challenges in codebook utilization, limiting reconstruction fidelity in image modeling. We introduce a Dual Codebook mechanism that effectively addresses this limitation by partitioning the representation into complementary global and local components. The global codebook employs a lightweight transformer for concurrent updates of all code vectors, while the local codebook maintains precise feature representation through deterministic selection. This complementary approach is trained from scratch without requiring pre-trained knowledge. Experimental evaluation across multiple standard benchmark datasets demonstrates state-of-the-art reconstruction quality while using a compact codebook of size 512 - half the size of previous methods that require pre-training. Our approach achieves significant FID improvements across diverse image domains, particularly excelling in scene and face reconstruction tasks. These results establish Dual Codebook VQ as an efficient paradigm for high-fidelity image reconstruction with significantly reduced computational requirements.
Problem

Research questions and friction points this paper is trying to address.

Improves image reconstruction fidelity with reduced codebook size.
Introduces Dual Codebook for global and local feature representation.
Achieves state-of-the-art results without pre-trained models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Codebook mechanism enhances image reconstruction.
Global codebook uses lightweight transformer for updates.
Local codebook ensures precise feature representation.
🔎 Similar Papers
No similar papers found.
Parisa Boodaghi Malidarreh
Parisa Boodaghi Malidarreh
PhD student at UTA
machine learningbioinformaticsartificial neural network
Jillur Rahman Saurav
Jillur Rahman Saurav
PhD Student and Graduate Research Assistant, Luber Lab at The University of Texas at Arlington
Medical ImagingGenAINLPComputer VisionData Science
Thuong Le Hoai Pham
Thuong Le Hoai Pham
Engineering & Computer PhD Student, University of Texas at Arlington
Amir Hajighasemi
Amir Hajighasemi
PhD student in CS, University of Texas at Arlington
Machine learningComputer VisionMedical Imaging
A
Anahita Samadi
Department of Computer Science, The University of Texas at Arlington
S
Saurabh Maydeo
Department of Computer Science, The University of Texas at Arlington
M
M. Nasr
Department of Computer Science, The University of Texas at Arlington
J
Jacob M. Luber
Department of Computer Science, The University of Texas at Arlington