🤖 AI Summary
This work addresses the challenging problem of rain removal in nighttime videos, where complex interactions between raindrops and artificial lighting severely degrade performance. Existing methods, trained on small-scale synthetic data, struggle to generalize to real-world scenarios. To bridge this gap, the authors present the first large-scale, high-fidelity nighttime rainy video dataset, comprising 600,000 paired 1080p frames, generated using Unreal Engine with physically accurate 3D particle-based modeling of optical effects such as color refraction, occlusion, and rain veiling. Building upon this dataset, they formulate deraining as a video-to-video generation task and establish a new baseline by leveraging the Wan 2.2 model as a strong generative prior. Experiments demonstrate that the proposed approach significantly outperforms existing methods on real nighttime rainy videos, effectively narrowing the domain gap between simulation and reality.
📝 Abstract
Nighttime video deraining is uniquely challenging because raindrops interact with artificial lighting. Unlike daytime white rain, nighttime rain takes on various colors and appears locally illuminated. Existing small-scale synthetic datasets rely on 2D rain overlays and fail to capture these physical properties, causing models to generalize poorly to real-world night rain. Meanwhile, capturing real paired nighttime videos remains impractical because rain effects cannot be isolated from other degradations like sensor noise. To bridge this gap, we introduce UENR-600K, a large-scale, physically grounded dataset containing 600,000 1080p frame pairs. We utilize Unreal Engine to simulate rain as 3D particles within virtual environments. This approach guarantees photorealism and physically real raindrops, capturing correct details like color refractions, scene occlusions, rain curtains. Leveraging this high-quality data, we establish a new state-of-the-art baseline by adapting the Wan 2.2 video generation model. Our baseline treat deraining as a video-to-video generation task, exploiting strong generative priors to almost entirely bridge the sim-to-real gap. Extensive benchmarking demonstrates that models trained on our dataset generalize significantly better to real-world videos. Project page: https://showlab.github.io/UENR-600K/.