DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion

📅 2026-01-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the performance and efficiency bottlenecks in image inversion for diffusion models, which stem from the absence of ground-truth noise supervision. The authors propose DeepInv, a self-supervised inversion method that introduces, for the first time, a trainable parametric solver to map images to noise without requiring real labels. By leveraging a self-supervised objective and data augmentation, DeepInv generates high-quality pseudo-noise, and further incorporates an iterative multi-scale training mechanism to enhance inversion accuracy and speed. Evaluated on the COCO dataset, DeepInv achieves a 40.4% improvement in SSIM over EasyInv and accelerates inference by nearly 99× compared to ReNoise, substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract

Diffusion inversion is a task of recovering the noise of an image in a diffusion model, which is vital for controllable diffusion image editing. At present, diffusion inversion still remains a challenging task due to the lack of viable supervision signals. Thus, most existing methods resort to approximation-based solutions, which however are often at the cost of performance or efficiency. To remedy these shortcomings, we propose a novel self-supervised diffusion inversion approach in this paper, termed Deep Inversion (DeepInv). Instead of requiring ground-truth noise annotations, we introduce a self-supervised objective as well as a data augmentation strategy to generate high-quality pseudo noises from real images without manual intervention. Based on these two innovative designs, DeepInv is also equipped with an iterative and multi-scale training regime to train a parameterized inversion solver, thereby achieving the fast and accurate image-to-noise mapping. To the best of our knowledge, this is the first attempt of presenting a trainable solver to predict inversion noise step by step. The extensive experiments show that our DeepInv can achieve much better performance and inference speed than the compared methods, e.g., +40.435% SSIM than EasyInv and +9887.5% speed than ReNoise on COCO dataset. Moreover, our careful designs of trainable solvers can also provide insights to the community. Codes and model parameters will be released in https://github.com/potato-kitty/DeepInv.

Problem

Research questions and friction points this paper is trying to address.

diffusion inversion

noise recovery

self-supervised learning

image editing

diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning

diffusion inversion

trainable solver