Zero-Shot Video Deraining with Video Diffusion Models

📅 2025-11-23

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing video deraining methods rely on synthetic or static-scene paired data, exhibiting poor generalization; meanwhile, fine-tuning diffusion models often degrades the pretrained generative prior, limiting effectiveness on real-world dynamic rainy scenes. To address this, we propose the first zero-shot video deraining framework—requiring no paired data and performing no model fine-tuning. Leveraging only a pre-trained text-to-video diffusion model, our approach employs latent-space inversion, negative prompting guidance, and a novel attention-switching mechanism to suppress rain streak artifacts while preserving structural consistency of dynamic backgrounds. Extensive experiments demonstrate that our method significantly outperforms prior approaches on real-world rainy videos, achieving superior generalization across diverse dynamic scenarios. This work establishes a new unsupervised paradigm for video deraining in complex, realistic motion-rich environments.

Technology Category

Application Category

📝 Abstract

Existing video deraining methods are often trained on paired datasets, either synthetic, which limits their ability to generalize to real-world rain, or captured by static cameras, which restricts their effectiveness in dynamic scenes with background and camera motion. Furthermore, recent works in fine-tuning diffusion models have shown promising results, but the fine-tuning tends to weaken the generative prior, limiting generalization to unseen cases. In this paper, we introduce the first zero-shot video deraining method for complex dynamic scenes that does not require synthetic data nor model fine-tuning, by leveraging a pretrained text-to-video diffusion model that demonstrates strong generalization capabilities. By inverting an input video into the latent space of diffusion models, its reconstruction process can be intervened and pushed away from the model's concept of rain using negative prompting. At the core of our approach is an attention switching mechanism that we found is crucial for maintaining dynamic backgrounds as well as structural consistency between the input and the derained video, mitigating artifacts introduced by naive negative prompting. Our approach is validated through extensive experiments on real-world rain datasets, demonstrating substantial improvements over prior methods and showcasing robust generalization without the need for supervised training.

Problem

Research questions and friction points this paper is trying to address.

Eliminating rain from videos without training on paired datasets

Preserving dynamic backgrounds and structural consistency during deraining

Avoiding model fine-tuning that weakens generative prior capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging pretrained video diffusion model without fine-tuning

Using negative prompting to remove rain concept

Employing attention switching for background and structural consistency

🔎 Similar Papers

DiffIR2VR-Zero: Zero-Shot Video Restoration with Diffusion-based Image Restoration Models

2024-07-01arXiv.orgCitations: 4

Bosch Group

Renningen, BW, DE

PhD - Effiziente Neuronale Repräsentation von Datensätzen

Bosch Group

Renningen, BW, DE

AI Research Scientist, Computer Vision - Facebook Video Intelligence