Demystifying MaskGIT Sampler and Beyond: Adaptive Order Selection in Masked Diffusion

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Masked diffusion models suffer from low sampling efficiency and weak theoretical interpretability, hindering practical deployment. This paper uncovers an implicit temperature-based sampling mechanism in MaskGIT and proposes an analytically tractable “moment sampler” following a “localize-then-sample” paradigm. We further introduce a partial caching strategy and an adaptive, non-uniform mask removal schedule grounded in exploration-exploitation trade-offs, enabling dynamic optimization of the denoising trajectory. Built upon the Transformer architecture, our method unifies position selection and token generation within a single modeling framework. Extensive experiments on image and text generation demonstrate substantial acceleration—up to 3.2× fewer sampling steps—while preserving or improving generation quality. Crucially, the approach provides a rigorous theoretical analysis framework. This work establishes a new paradigm for efficient and interpretable masked diffusion modeling.

Technology Category

Application Category

📝 Abstract

Masked diffusion models have shown promising performance in generating high-quality samples in a wide range of domains, but accelerating their sampling process remains relatively underexplored. To investigate efficient samplers for masked diffusion, this paper theoretically analyzes the MaskGIT sampler for image modeling, revealing its implicit temperature sampling mechanism. Through this analysis, we introduce the "moment sampler," an asymptotically equivalent but more tractable and interpretable alternative to MaskGIT, which employs a "choose-then-sample" approach by selecting unmasking positions before sampling tokens. In addition, we improve the efficiency of choose-then-sample algorithms through two key innovations: a partial caching technique for transformers that approximates longer sampling trajectories without proportional computational cost, and a hybrid approach formalizing the exploration-exploitation trade-off in adaptive unmasking. Experiments in image and text domains demonstrate our theory as well as the efficiency of our proposed methods, advancing both theoretical understanding and practical implementation of masked diffusion samplers.

Problem

Research questions and friction points this paper is trying to address.

Analyzing MaskGIT's implicit temperature sampling mechanism

Introducing a tractable choose-then-sample alternative to MaskGIT

Improving efficiency through caching and adaptive unmasking strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces moment sampler with choose-then-sample approach

Develops partial caching technique for transformer efficiency

Proposes hybrid method for adaptive unmasking trade-off

🔎 Similar Papers

Amortized Posterior Sampling with Diffusion Prior Distillation