LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization

📅 2026-02-17

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses optimization instability, low codebook utilization, and representation collapse in large-scale discrete image tokenization by proposing Learnable Geometric Quantization (LGQ). LGQ introduces the first end-to-end differentiable framework for learning quantization geometry, replacing hard nearest-neighbor assignment with a temperature-controlled soft assignment derived from posterior responsibilities of an isotropic Gaussian mixture model, while recovering hard assignment at inference. By incorporating token-level peakiness and global usage regularizers, LGQ achieves high-confidence and balanced codebook utilization without relying on fixed grids. On ImageNet with a 16K codebook, LGQ improves rFID by 11.88% over FSQ while reducing active codes by 49.96%, and outperforms SimVQ with a 6.06% rFID gain alongside a 49.45% reduction in effective representation rate.

Technology Category

Application Category

📝 Abstract

Discrete image tokenization is a key bottleneck for scalable visual generation: a tokenizer must remain compact for efficient latent-space priors while preserving semantic structure and using discrete capacity effectively. Existing quantizers face a trade-off: vector-quantized tokenizers learn flexible geometries but often suffer from biased straight-through optimization, codebook under-utilization, and representation collapse at large vocabularies. Structured scalar or implicit tokenizers ensure stable, near-complete utilization by design, yet rely on fixed discretization geometries that may allocate capacity inefficiently under heterogeneous latent statistics. We introduce Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end. LGQ replaces hard nearest-neighbor lookup with temperature-controlled soft assignments, enabling fully differentiable training while recovering hard assignments at inference. The assignments correspond to posterior responsibilities of an isotropic Gaussian mixture and minimize a variational free-energy objective, provably converging to nearest-neighbor quantization in the low-temperature limit. LGQ combines a token-level peakedness regularizer with a global usage regularizer to encourage confident yet balanced code utilization without imposing rigid grids. Under a controlled VQGAN-style backbone on ImageNet across multiple vocabulary sizes, LGQ achieves stable optimization and balanced utilization. At 16K codebook size, LGQ improves rFID by 11.88% over FSQ while using 49.96% fewer active codes, and improves rFID by 6.06% over SimVQ with 49.45% lower effective representation rate, achieving comparable fidelity with substantially fewer active entries. Our GitHub repository is available at: https://github.com/KurbanIntelligenceLab/LGQ

Problem

Research questions and friction points this paper is trying to address.

discrete image tokenization

vector quantization

codebook utilization

discretization geometry

scalable visual generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable Geometric Quantization

differentiable tokenization

soft assignment