🤖 AI Summary
This work addresses the limitation of discrete diffusion foundation models in inverse problem solving—namely, their reliance on task-specific fine-tuning. We propose Anchored Posterior Sampling (APS), a fine-tuning-free Bayesian image reconstruction framework. APS introduces quantized expectations in the discrete embedding space to enable gradient-based measurement guidance and designs an anchored re-masking mechanism for adaptive, robust iterative decoding. The method unifies treatment of both linear and nonlinear inverse problems—including super-resolution, denoising, and compressive sensing—and achieves state-of-the-art performance among discrete diffusion samplers on standard benchmarks. Furthermore, APS extends seamlessly to training-free, text-guided image editing and style transfer, demonstrating strong cross-task generalization. Its core contribution is the first direct application of pre-trained discrete diffusion models to general, training-free posterior inference—bypassing any parameter adaptation while preserving principled Bayesian reconstruction.
📝 Abstract
We study the problem of posterior sampling using pretrained discrete diffusion foundation models, aiming to recover images from noisy measurements without retraining task-specific models. While diffusion models have achieved remarkable success in generative modeling, most advances rely on continuous Gaussian diffusion. In contrast, discrete diffusion offers a unified framework for jointly modeling categorical data such as text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free Bayesian inference, making it particularly well-suited for posterior sampling. However, existing approaches to discrete diffusion posterior sampling face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS) for masked diffusion foundation models, built on two key innovations -- quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. Our approach achieves state-of-the-art performance among discrete diffusion samplers across linear and nonlinear inverse problems on the standard benchmarks. We further demonstrate the benefits of our approach in training-free stylization and text-guided editing.