🤖 AI Summary
Discrete diffusion models face a fundamental trade-off between generation quality and sampling efficiency when reducing the number of denoising steps. To address this, we propose a prediction–correction sampling framework, whose core innovation is an information-guided corrector: it explicitly models and suppresses forward cumulative errors using the model’s own intermediate outputs. The corrector employs a hollow Transformer architecture—designed to reduce parameter redundancy while preserving long-range dependencies—and a customized loss function that enhances correction accuracy and improves training signal utilization. Experiments on tokenized ImageNet 256×256 demonstrate that our method achieves state-of-the-art performance with significantly fewer sampling steps (e.g., 16 steps), reducing the FID score by up to 15.3% over strong baselines. It thus delivers both accelerated sampling and superior fidelity, establishing a new paradigm for efficient discrete diffusion sampling.
📝 Abstract
Discrete diffusion has emerged as a powerful framework for generative modeling in discrete domains, yet efficiently sampling from these models remains challenging. Existing sampling strategies often struggle to balance computation and sample quality when the number of sampling steps is reduced, even when the model has learned the data distribution well. To address these limitations, we propose a predictor-corrector sampling scheme where the corrector is informed by the diffusion model to more reliably counter the accumulating approximation errors. To further enhance the effectiveness of our informed corrector, we introduce complementary architectural modifications based on hollow transformers and a simple tailored training objective that leverages more training signal. We use a synthetic example to illustrate the failure modes of existing samplers and show how informed correctors alleviate these problems. On tokenized ImageNet 256x256, this approach consistently produces superior samples with fewer steps, achieving improved FID scores for discrete diffusion models. These results underscore the potential of informed correctors for fast and high-fidelity generation using discrete diffusion.