Remasking Discrete Diffusion Models with Inference-Time Scaling

📅 2025-03-01

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing discrete masked diffusion models lack iterative refinement capability during inference—once tokens are generated, they cannot be corrected, limiting generation quality and controllability in text, image, and molecular design tasks. To address this, we propose ReMDM, a novel sampling algorithm that introduces a learnable remasking backward process for pretrained discrete diffusion models, enabling dynamic resetting and updating of already-generated tokens at inference time. ReMDM achieves computationally scalable sampling: performance approaches that of autoregressive models as step count increases, while remaining robust under reduced steps. Empirically, it significantly improves text generation quality, discrete image fidelity, and controllability/precision in conditional molecular structure generation. Across multiple benchmarks, ReMDM advances the Pareto frontier of controllable discrete generation, establishing new state-of-the-art trade-offs between speed, quality, and guidance accuracy.

Technology Category

Application Category

📝 Abstract

Part of the success of diffusion models stems from their ability to perform iterative refinement, i.e., repeatedly correcting outputs during generation. However, modern masked discrete diffusion lacks this capability: when a token is generated, it cannot be updated again, even when it introduces an error. Here, we address this limitation by introducing the remasking diffusion model (ReMDM) sampler, a method that can be applied to pretrained masked diffusion models in a principled way and that is derived from a discrete diffusion model with a custom remasking backward process. Most interestingly, ReMDM endows discrete diffusion with a form of inference-time compute scaling. By increasing the number of sampling steps, ReMDM generates natural language outputs that approach the quality of autoregressive models, whereas when the computation budget is limited, ReMDM better maintains quality. ReMDM also improves sample quality of masked diffusion models for discretized images, and in scientific domains such as molecule design, ReMDM facilitates diffusion guidance and pushes the Pareto frontier of controllability relative to classical masking and uniform noise diffusion. We provide the code along with a blog post on the project page: https://remdm.github.io.

Problem

Research questions and friction points this paper is trying to address.

Enables iterative refinement in masked discrete diffusion models.

Introduces remasking for improved natural language generation quality.

Enhances controllability in scientific domains like molecule design.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces remasking diffusion model (ReMDM) sampler

Enables inference-time compute scaling for discrete diffusion

Improves sample quality in language and image domains

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training