Style Quantization for Data-Efficient GAN Training

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

To address the challenge of navigating GAN latent spaces under few-shot conditions—leading to degraded consistency regularization (CR)—this paper proposes a Style-Space Disentanglement and Learnable Vector Quantization (VQ) framework. First, it maps the continuous latent space into a semantically disentangled style space. Second, it introduces an optimal transport alignment mechanism guided by foundation model features to construct a semantically aligned discrete codebook. Finally, it integrates feature distillation with enhanced CR. This work pioneers a style-space quantization paradigm, enabling effective external knowledge injection and semantic enrichment of the codebook. Experiments demonstrate that, under low-data regimes, the method reduces FID by over 20%, significantly improves discriminator robustness and generation consistency, and substantially enhances CR stability.

Technology Category

Application Category

📝 Abstract

Under limited data setting, GANs often struggle to navigate and effectively exploit the input latent space. Consequently, images generated from adjacent variables in a sparse input latent space may exhibit significant discrepancies in realism, leading to suboptimal consistency regularization (CR) outcomes. To address this, we propose extit{SQ-GAN}, a novel approach that enhances CR by introducing a style space quantization scheme. This method transforms the sparse, continuous input latent space into a compact, structured discrete proxy space, allowing each element to correspond to a specific real data point, thereby improving CR performance. Instead of direct quantization, we first map the input latent variables into a less entangled ``style'' space and apply quantization using a learnable codebook. This enables each quantized code to control distinct factors of variation. Additionally, we optimize the optimal transport distance to align the codebook codes with features extracted from the training data by a foundation model, embedding external knowledge into the codebook and establishing a semantically rich vocabulary that properly describes the training dataset. Extensive experiments demonstrate significant improvements in both discriminator robustness and generation quality with our method.

Problem

Research questions and friction points this paper is trying to address.

GANs struggle with sparse latent space under limited data

Adjacent latent variables produce unrealistic image discrepancies

Propose SQ-GAN to quantize style space for better consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Style space quantization enhances consistency regularization

Learnable codebook maps latent to discrete proxy space

Optimal transport aligns codebook with foundation model features

🔎 Similar Papers

LatentForensics: Towards frugal deepfake detection in the StyleGAN latent space