Texture Vector-Quantization and Reconstruction Aware Prediction for Generative Super-Resolution

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vector quantization (VQ) methods for generative super-resolution suffer from two key limitations: (1) coarse-grained quantization of all visual features, leading to significant texture degradation and high quantization distortion; and (2) index predictors trained solely with codebook-level supervision, neglecting end-to-end reconstruction fidelity. To address these, we propose Texture Vector Quantization (TVQ) and Reconstruction-Aware Prediction (RAP). TVQ selectively models only high-frequency missing textures—bypassing low-frequency content—thereby substantially reducing quantization artifacts. RAP incorporates a straight-through estimator to enable end-to-end, image-level supervision, directly optimizing index prediction for reconstruction quality. Our approach achieves marked improvements in texture realism and structural consistency of super-resolved images with minimal computational overhead. Extensive experiments demonstrate state-of-the-art performance over prevailing VQ-based methods across multiple benchmarks, validating the synergistic enhancement of prior modeling accuracy and final reconstruction quality.

Technology Category

Application Category

📝 Abstract
Vector-quantized based models have recently demonstrated strong potential for visual prior modeling. However, existing VQ-based methods simply encode visual features with nearest codebook items and train index predictor with code-level supervision. Due to the richness of visual signal, VQ encoding often leads to large quantization error. Furthermore, training predictor with code-level supervision can not take the final reconstruction errors into consideration, result in sub-optimal prior modeling accuracy. In this paper we address the above two issues and propose a Texture Vector-Quantization and a Reconstruction Aware Prediction strategy. The texture vector-quantization strategy leverages the task character of super-resolution and only introduce codebook to model the prior of missing textures. While the reconstruction aware prediction strategy makes use of the straight-through estimator to directly train index predictor with image-level supervision. Our proposed generative SR model (TVQ&RAP) is able to deliver photo-realistic SR results with small computational cost.
Problem

Research questions and friction points this paper is trying to address.

Reduces quantization error in vector-quantized super-resolution models
Improves prior modeling accuracy with reconstruction-aware training
Generates photo-realistic super-resolution results with low computational cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Texture Vector-Quantization models missing textures prior
Reconstruction Aware Prediction uses image-level supervision
Straight-through estimator trains index predictor directly
🔎 Similar Papers
No similar papers found.
Q
Qifan Li
University of Electronic Science and Technology of China
J
Jiale Zou
University of Electronic Science and Technology of China
Jinhua Zhang
Jinhua Zhang
University of Electronic Science and Technology of China
Visual Generation
W
Wei Long
University of Electronic Science and Technology of China
X
Xinyu Zhou
University of Electronic Science and Technology of China
Shuhang Gu
Shuhang Gu
University of Electronic Science and Technology of China
image processingpattern recognitioncomputer vision