Generating time-consistent dynamics with discriminator-guided image diffusion models

๐Ÿ“… 2025-05-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing video diffusion models (VDMs) generate high-fidelity dynamic videos but require prohibitively expensive end-to-end training, limiting practical deployment. This paper proposes a discriminator-guided zero-finetuning paradigm that leverages pre-trained image diffusion models (e.g., DDPM) to directly synthesize spatiotemporally coherent videosโ€”without architectural modifications or parameter updates. Our key contribution is the first design of a time-consistency discriminator, which imposes gradient-free spatiotemporal constraints during sampling to calibrate uncertainty and control bias. The method operates solely via inference-time optimization, drastically reducing computational overhead. Evaluated on idealized turbulence and global precipitation datasets, our approach achieves temporal consistency on par with fully trained VDMs, while successfully enabling century-scale, daily-resolution climate simulations.

Technology Category

Application Category

๐Ÿ“ Abstract
Realistic temporal dynamics are crucial for many video generation, processing and modelling applications, e.g. in computational fluid dynamics, weather prediction, or long-term climate simulations. Video diffusion models (VDMs) are the current state-of-the-art method for generating highly realistic dynamics. However, training VDMs from scratch can be challenging and requires large computational resources, limiting their wider application. Here, we propose a time-consistency discriminator that enables pretrained image diffusion models to generate realistic spatiotemporal dynamics. The discriminator guides the sampling inference process and does not require extensions or finetuning of the image diffusion model. We compare our approach against a VDM trained from scratch on an idealized turbulence simulation and a real-world global precipitation dataset. Our approach performs equally well in terms of temporal consistency, shows improved uncertainty calibration and lower biases compared to the VDM, and achieves stable centennial-scale climate simulations at daily time steps.
Problem

Research questions and friction points this paper is trying to address.

Generating realistic temporal dynamics without training video diffusion models from scratch
Improving temporal consistency and uncertainty calibration in spatiotemporal simulations
Enabling stable long-term climate simulations using pretrained image diffusion models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses discriminator-guided image diffusion models
No extensions or finetuning required
Achieves stable long-term climate simulations
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Philipp Hess
Technical University of Munich, Potsdam Institute for Climate Impact Research
M
Maximilian Gelbrecht
Technical University of Munich, Potsdam Institute for Climate Impact Research
C
Christof Schotz
Technical University of Munich, Potsdam Institute for Climate Impact Research
M
Michael Aich
Technical University of Munich, Potsdam Institute for Climate Impact Research
Y
Yu Huang
Technical University of Munich, Potsdam Institute for Climate Impact Research
S
Shangshang Yang
Technical University of Munich, Potsdam Institute for Climate Impact Research
Niklas Boers
Niklas Boers
Technical University of Munich, Potsdam Institute for Climate Impact Research, University of Exeter
Earth system dynamicsdata-driven modellingabrupt transitionsextreme events