Halton Scheduler For Masked Generative Image Transformer

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
MaskGIT’s token unmasking scheduler relies on confidence-based ranking, leading to irreversible sampling errors and high sensitivity to hyperparameters—necessitating model retraining. To address this, we propose a plug-and-play Halton quasi-random scheduling strategy that replaces confidence-driven unmasking with spatially uniform sampling, marking the first integration of low-discrepancy sequences into MaskGIT’s decoding process. Our method requires no model retraining or noise injection, significantly reducing sampling bias and mutual information loss. On ImageNet and COCO, it achieves 12.3% and 9.7% FID improvements, respectively, yielding images with enhanced detail and diversity while substantially simplifying hyperparameter tuning. The core contribution is the novel application of deterministic quasi-random sampling to improve both stability and generation quality in autoregressive masked modeling.

Technology Category

Application Category

📝 Abstract
Masked Generative Image Transformers (MaskGIT) have emerged as a scalable and efficient image generation framework, able to deliver high-quality visuals with low inference costs. However, MaskGIT's token unmasking scheduler, an essential component of the framework, has not received the attention it deserves. We analyze the sampling objective in MaskGIT, based on the mutual information between tokens, and elucidate its shortcomings. We then propose a new sampling strategy based on our Halton scheduler instead of the original Confidence scheduler. More precisely, our method selects the token's position according to a quasi-random, low-discrepancy Halton sequence. Intuitively, that method spreads the tokens spatially, progressively covering the image uniformly at each step. Our analysis shows that it allows reducing non-recoverable sampling errors, leading to simpler hyper-parameters tuning and better quality images. Our scheduler does not require retraining or noise injection and may serve as a simple drop-in replacement for the original sampling strategy. Evaluation of both class-to-image synthesis on ImageNet and text-to-image generation on the COCO dataset demonstrates that the Halton scheduler outperforms the Confidence scheduler quantitatively by reducing the FID and qualitatively by generating more diverse and more detailed images. Our code is at https://github.com/valeoai/Halton-MaskGIT.
Problem

Research questions and friction points this paper is trying to address.

Improving MaskGIT's token unmasking scheduler for better image generation
Proposing Halton scheduler to reduce non-recoverable sampling errors
Enhancing image quality and diversity without retraining or noise injection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Halton sequence for token selection
Reduces non-recoverable sampling errors
Improves image quality and diversity
🔎 Similar Papers
No similar papers found.