🤖 AI Summary
Current text-to-image generation models, such as FLUX, often produce visual and anatomical artifacts due to biased sampling trajectories, and existing post-processing approaches struggle to effectively intervene during the core generation phase. This work proposes a training-free, inference-stage trajectory correction method that dynamically steers the generation path away from artifact-inducing latent states by reconstructing an estimate of the clean sample at each denoising step. For the first time, this approach enables zero-shot artifact suppression in standard diffusion or flow-matching models without modifying model weights or requiring additional training. The method significantly enhances image fidelity while incurring no extra training cost or computational overhead.
📝 Abstract
Despite impressive results from recent text-to-image models like FLUX, visual and anatomical artifacts remain a significant hurdle for practical and professional use. Existing methods for artifact reduction, typically work in a post-hoc manner, consequently failing to intervene effectively during the core image formation process. Notably, current techniques require problematic and invasive modifications to the model weights, or depend on a computationally expensive and time-consuming process of regional refinement. To address these limitations, we propose DIAMOND, a training-free method that applies trajectory correction to mitigate artifacts during inference. By reconstructing an estimate of the clean sample at every step of the generative trajectory, DIAMOND actively steers the generation process away from latent states that lead to artifacts. Furthermore, we extend the proposed method to standard Diffusion Models, demonstrating that DIAMOND provides a robust, zero-shot path to high-fidelity, artifact-free image synthesis without the need for additional training or weight modifications in modern generative architectures. Code is available at https://gmum.github.io/DIAMOND/