DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance and efficiency bottlenecks in image inversion for diffusion models, which stem from the absence of ground-truth noise supervision. The authors propose DeepInv, a self-supervised inversion method that introduces, for the first time, a trainable parametric solver to map images to noise without requiring real labels. By leveraging a self-supervised objective and data augmentation, DeepInv generates high-quality pseudo-noise, and further incorporates an iterative multi-scale training mechanism to enhance inversion accuracy and speed. Evaluated on the COCO dataset, DeepInv achieves a 40.4% improvement in SSIM over EasyInv and accelerates inference by nearly 99× compared to ReNoise, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Diffusion inversion is a task of recovering the noise of an image in a diffusion model, which is vital for controllable diffusion image editing. At present, diffusion inversion still remains a challenging task due to the lack of viable supervision signals. Thus, most existing methods resort to approximation-based solutions, which however are often at the cost of performance or efficiency. To remedy these shortcomings, we propose a novel self-supervised diffusion inversion approach in this paper, termed Deep Inversion (DeepInv). Instead of requiring ground-truth noise annotations, we introduce a self-supervised objective as well as a data augmentation strategy to generate high-quality pseudo noises from real images without manual intervention. Based on these two innovative designs, DeepInv is also equipped with an iterative and multi-scale training regime to train a parameterized inversion solver, thereby achieving the fast and accurate image-to-noise mapping. To the best of our knowledge, this is the first attempt of presenting a trainable solver to predict inversion noise step by step. The extensive experiments show that our DeepInv can achieve much better performance and inference speed than the compared methods, e.g., +40.435% SSIM than EasyInv and +9887.5% speed than ReNoise on COCO dataset. Moreover, our careful designs of trainable solvers can also provide insights to the community. Codes and model parameters will be released in https://github.com/potato-kitty/DeepInv.
Problem

Research questions and friction points this paper is trying to address.

diffusion inversion
noise recovery
self-supervised learning
image editing
diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
diffusion inversion
trainable solver
pseudo noise generation
multi-scale training
🔎 Similar Papers
No similar papers found.
Z
Ziyue Zhang
Xiamen University
L
Luxi Lin
Xiamen University
X
Xiaolin Hu
Xiamen University
C
Chao Chang
National University of Defense Technology
H
Huaixi Wang
National University of Defense Technology
Yiyi Zhou
Yiyi Zhou
Xiamen University
deep learninglanguage and vision
R
Rongrong Ji
Xiamen University