LGQ: Learning Discretization Geometry for Scalable and Stable Image Tokenization

📅 2026-02-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses optimization instability, low codebook utilization, and representation collapse in large-scale discrete image tokenization by proposing Learnable Geometric Quantization (LGQ). LGQ introduces the first end-to-end differentiable framework for learning quantization geometry, replacing hard nearest-neighbor assignment with a temperature-controlled soft assignment derived from posterior responsibilities of an isotropic Gaussian mixture model, while recovering hard assignment at inference. By incorporating token-level peakiness and global usage regularizers, LGQ achieves high-confidence and balanced codebook utilization without relying on fixed grids. On ImageNet with a 16K codebook, LGQ improves rFID by 11.88% over FSQ while reducing active codes by 49.96%, and outperforms SimVQ with a 6.06% rFID gain alongside a 49.45% reduction in effective representation rate.

Technology Category

Application Category

📝 Abstract
Discrete image tokenization is a key bottleneck for scalable visual generation: a tokenizer must remain compact for efficient latent-space priors while preserving semantic structure and using discrete capacity effectively. Existing quantizers face a trade-off: vector-quantized tokenizers learn flexible geometries but often suffer from biased straight-through optimization, codebook under-utilization, and representation collapse at large vocabularies. Structured scalar or implicit tokenizers ensure stable, near-complete utilization by design, yet rely on fixed discretization geometries that may allocate capacity inefficiently under heterogeneous latent statistics. We introduce Learnable Geometric Quantization (LGQ), a discrete image tokenizer that learns discretization geometry end-to-end. LGQ replaces hard nearest-neighbor lookup with temperature-controlled soft assignments, enabling fully differentiable training while recovering hard assignments at inference. The assignments correspond to posterior responsibilities of an isotropic Gaussian mixture and minimize a variational free-energy objective, provably converging to nearest-neighbor quantization in the low-temperature limit. LGQ combines a token-level peakedness regularizer with a global usage regularizer to encourage confident yet balanced code utilization without imposing rigid grids. Under a controlled VQGAN-style backbone on ImageNet across multiple vocabulary sizes, LGQ achieves stable optimization and balanced utilization. At 16K codebook size, LGQ improves rFID by 11.88% over FSQ while using 49.96% fewer active codes, and improves rFID by 6.06% over SimVQ with 49.45% lower effective representation rate, achieving comparable fidelity with substantially fewer active entries. Our GitHub repository is available at: https://github.com/KurbanIntelligenceLab/LGQ
Problem

Research questions and friction points this paper is trying to address.

discrete image tokenization
vector quantization
codebook utilization
discretization geometry
scalable visual generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable Geometric Quantization
differentiable tokenization
soft assignment
codebook utilization
discrete image representation
🔎 Similar Papers
No similar papers found.
I
Idil Bilge Altun
Indiana University Bloomington, School of Informatics, Computing, and Engineering
M
Mert Onur Cakiroglu
Indiana University Bloomington, School of Informatics, Computing, and Engineering
E
Elham Buxton
University of Illinois Springfield, Computer Science
M
Mehmet Dalkilic
Indiana University Bloomington, School of Informatics, Computing, and Engineering
Hasan Kurban
Hasan Kurban
Hamad Bin Khalifa University
Artificial IntelligenceSoftware EngineeringAI for Science