Exploiting Discriminative Codebook Prior for Autoregressive Image Generation

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Current autoregressive image generation models fail to effectively leverage the semantic similarity priors embedded in learned codebooks during training. Conventional k-means clustering is inadequate for modeling true token-level similarities due to heterogeneity in the token embedding space and distortion in centroid-based distance metrics. To address this, we propose the Discriminative Codebook Prior Extractor (DCPE), a plug-and-play module that replaces centroid distances with instance-level similarity measurements and employs a bottom-up hierarchical aggregation strategy to extract structured codebook priors. DCPE requires no modification to the backbone architecture and is compatible with any discrete-tokenized autoregressive generator. Evaluated on LlamaGen-B, DCPE achieves a 42% reduction in training time, improves FID by 18.7%, and increases Inception Score (IS) by 12.3%, demonstrating its efficiency, effectiveness, and strong generalization across architectures.

Technology Category

Application Category

📝 Abstract

Advanced discrete token-based autoregressive image generation systems first tokenize images into sequences of token indices with a codebook, and then model these sequences in an autoregressive paradigm. While autoregressive generative models are trained only on index values, the prior encoded in the codebook, which contains rich token similarity information, is not exploited. Recent studies have attempted to incorporate this prior by performing naive k-means clustering on the tokens, helping to facilitate the training of generative models with a reduced codebook. However, we reveal that k-means clustering performs poorly in the codebook feature space due to inherent issues, including token space disparity and centroid distance inaccuracy. In this work, we propose the Discriminative Codebook Prior Extractor (DCPE) as an alternative to k-means clustering for more effectively mining and utilizing the token similarity information embedded in the codebook. DCPE replaces the commonly used centroid-based distance, which is found to be unsuitable and inaccurate for the token feature space, with a more reasonable instance-based distance. Using an agglomerative merging technique, it further addresses the token space disparity issue by avoiding splitting high-density regions and aggregating low-density ones. Extensive experiments demonstrate that DCPE is plug-and-play and integrates seamlessly with existing codebook prior-based paradigms. With the discriminative prior extracted, DCPE accelerates the training of autoregressive models by 42% on LlamaGen-B and improves final FID and IS performance.

Problem

Research questions and friction points this paper is trying to address.

Improves token similarity use in autoregressive image generation

Replaces k-means with better codebook prior extraction method

Solves token space disparity and centroid distance issues

Innovation

Methods, ideas, or system contributions that make the work stand out.

DCPE replaces centroid-based with instance-based distance

DCPE uses agglomerative merging for token space

DCPE accelerates training and improves performance

🔎 Similar Papers

No similar papers found.