Diffusion Models in Low-Level Vision: A Survey

📅 2024-06-17
🏛️ arXiv.org
📈 Citations: 13
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of systematic surveys and unified modeling frameworks for diffusion models in low-level vision, this paper presents the first comprehensive survey covering over 20 tasks—including image restoration, enhancement, and generation. We propose three general-purpose diffusion modeling paradigms, theoretically unify them with GANs and VAEs, and rigorously delineate their boundaries. A dual-perspective classification scheme—structured by both architecture and task—is introduced and extended to cross-domain applications (e.g., medical imaging, remote sensing, video). We conduct benchmarking with joint efficiency–performance evaluation and open-source a resource repository featuring 20+ models and standardized evaluation metrics. Key contributions include: (1) the first structured taxonomy for diffusion-based low-level vision; (2) a cross-task transferability analysis framework; and (3) identification of seven critical future research directions—collectively advancing both theoretical foundations and practical deployment of diffusion models in low-level vision.

Technology Category

Application Category

📝 Abstract
Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision.
Problem

Research questions and friction points this paper is trying to address.

Comprehensive survey of diffusion models in low-level vision.
Exploration of diffusion model frameworks and their correlations.
Evaluation and future directions for diffusion model applications.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive survey of diffusion models.
Multi-perspective categorization by framework.
Evaluation across 20 vision tasks.
🔎 Similar Papers
No similar papers found.
Chunming He
Chunming He
Duke University | Tsinghua University
Computer VisionMachine LearningBiomedical Image Analysis
Y
Yuqi Shen
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen 518055, China
Chengyu Fang
Chengyu Fang
Tsinghua University & Alibaba DAMO Academy
Computer VisionMedical AIEfficient MLLM
F
Fengyang Xiao
School of Mathematics (Zhuhai), Sun Yat-sen University, Zhuhai 510275, China
Longxiang Tang
Longxiang Tang
Tsinghua University
Computer Vision
Y
Yulun Zhang
School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
Wangmeng Zuo
Wangmeng Zuo
School of Computer Science and Technology, Harbin Institute of Technology
Computer VisionImage ProcessingGenerative AIDeep LearningBiometrics
Z
Z. Guo
Tianyijiaotong Technology Ltd., Suzhou 215131, China
Xiu Li
Xiu Li
Bytedance Seed
Computer VisionComputer Graphics3D Vision