🤖 AI Summary
Traditional diffusion models employ isotropic Gaussian denoising, which struggles to preserve image edges and structural details. To address this, we propose Edge-Aware Diffusion (EADiff), the first generative model to incorporate anisotropic diffusion principles: it introduces an edge-detection-guided adaptive noise scheduler that dynamically blends edge-preserving noise with standard Gaussian noise, and augments structural representation via frequency-domain analysis—specifically enhancing low-to-mid-frequency components. Evaluated on unconditional image generation, EADiff achieves ~30% improvements in both FID and CLIP scores. It also demonstrates markedly enhanced robustness and shape fidelity in structure-sensitive tasks such as sketch-to-image translation. Our core contribution is the first structural-aware noise scheduling mechanism in diffusion modeling—overcoming the conventional diffusion process’s neglect of geometric priors while simultaneously improving modeling accuracy and convergence efficiency.
📝 Abstract
Classical generative diffusion models learn an isotropic Gaussian denoising process, treating all spatial regions uniformly, thus neglecting potentially valuable structural information in the data. Inspired by the long-established work on anisotropic diffusion in image processing, we present a novel edge-preserving diffusion model that generalizes over existing isotropic models by considering a hybrid noise scheme. In particular, we introduce an edge-aware noise scheduler that varies between edge-preserving and isotropic Gaussian noise. We show that our model's generative process converges faster to results that more closely match the target distribution. We demonstrate its capability to better learn the low-to-mid frequencies within the dataset, which plays a crucial role in representing shapes and structural information. Our edge-preserving diffusion process consistently outperforms state-of-the-art baselines in unconditional image generation. It is also particularly more robust for generative tasks guided by a shape-based prior, such as stroke-to-image generation. We present qualitative and quantitative results (FID and CLIP score) showing consistent improvements of up to 30% for both tasks.