🤖 AI Summary
This work addresses the challenge of limited perception performance in autonomous driving under adverse weather conditions, primarily due to the scarcity of real-world data and the inability of existing weather synthesis methods to simultaneously ensure high visual fidelity and annotation reusability. The authors propose a controllable adverse-weather video generation framework that balances strong weather stylization with faithful preservation of critical objects through semantic-guided adaptive multi-control fusion. By integrating a vanishing-point-anchored temporal synthesis strategy and a mask-based training mechanism, the method generates temporally coherent, structurally consistent, and high-quality video sequences from a single static image. On the nuScenes validation set, the approach reduces FID by 50.0% and FVD by 16.1% without a given first frame, and further improves these metrics by 8.7% and 7.2%, respectively, when the first frame is provided, significantly outperforming current state-of-the-art methods.
📝 Abstract
Perception robustness under adverse weather remains a critical challenge for autonomous driving, with the core bottleneck being the scarcity of real-world video data in adverse weather. Existing weather generation approaches struggle to balance visual quality and annotation reusability. We present AutoAWG, a controllable Adverse Weather video Generation framework for Autonomous driving. Our method employs a semantics-guided adaptive fusion of multiple controls to balance strong weather stylization with high-fidelity preservation of safety-critical targets; leverages a vanishing point-anchored temporal synthesis strategy to construct training sequences from static images, thereby reducing reliance on synthetic data; and adopts masked training to enhance long-horizon generation stability. On the nuScenes validation set, AutoAWG significantly outperforms prior state-of-the-art methods: without first-frame conditioning, FID and FVD are relatively reduced by 50.0% and 16.1%; with first-frame conditioning, they are further reduced by 8.7% and 7.2%, respectively. Extensive qualitative and quantitative results demonstrate advantages in style fidelity, temporal consistency, and semantic--structural integrity, underscoring the practical value of AutoAWG for improving downstream perception in autonomous driving. Our code is available at: https://github.com/higherhu/AutoAWG