DiffDoctor: Diagnosing Image Diffusion Models Before Treating

📅 2025-01-21

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the challenges of artifact localization and interpretable repair in text-to-image diffusion models, this paper proposes a two-stage “diagnose-then-treat” optimization framework. In the first stage, a pixel-level artifact detector is constructed to enable fine-grained, localization-aware defect identification. In the second stage, the detection confidence map is integrated into the diffusion reverse process via gradient modulation and pixel-wise weighted loss to guide precise artifact correction. Our key contributions include: (i) the first introduction of localization-aware diagnostic modeling into diffusion optimization; (ii) construction of a million-scale defective image dataset with a human-in-the-loop annotation protocol. Experiments across multiple mainstream diffusion models show an average 42.7% reduction in artifact rate, a 3.2 improvement in FID, and an mAP@0.5 of 68.9—demonstrating both strong visual interpretability and restoration efficacy.

Technology Category

Application Category

📝 Abstract

In spite of the recent progress, image diffusion models still produce artifacts. A common solution is to refine an established model with a quality assessment system, which generally rates an image in its entirety. In this work, we believe problem-solving starts with identification, yielding the request that the model should be aware of not just the presence of defects in an image, but their specific locations. Motivated by this, we propose DiffDoctor, a two-stage pipeline to assist image diffusion models in generating fewer artifacts. Concretely, the first stage targets developing a robust artifact detector, for which we collect a dataset of over 1M flawed synthesized images and set up an efficient human-in-the-loop annotation process, incorporating a carefully designed class-balance strategy. The learned artifact detector is then involved in the second stage to tune the diffusion model through assigning a per-pixel confidence map for each synthesis. Extensive experiments on text-to-image diffusion models demonstrate the effectiveness of our artifact detector as well as the soundness of our diagnose-then-treat design.

Problem

Research questions and friction points this paper is trying to address.

Image Diffusion Technique

Accuracy in Identifying Issues

Efficient Problem Solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

DiffDoctor

Error Localization

Image Quality Enhancement

🔎 Similar Papers

No similar papers found.