Generating time-consistent dynamics with discriminator-guided image diffusion models

📅 2025-05-14

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing video diffusion models (VDMs) generate high-fidelity dynamic videos but require prohibitively expensive end-to-end training, limiting practical deployment. This paper proposes a discriminator-guided zero-finetuning paradigm that leverages pre-trained image diffusion models (e.g., DDPM) to directly synthesize spatiotemporally coherent videos—without architectural modifications or parameter updates. Our key contribution is the first design of a time-consistency discriminator, which imposes gradient-free spatiotemporal constraints during sampling to calibrate uncertainty and control bias. The method operates solely via inference-time optimization, drastically reducing computational overhead. Evaluated on idealized turbulence and global precipitation datasets, our approach achieves temporal consistency on par with fully trained VDMs, while successfully enabling century-scale, daily-resolution climate simulations.

Technology Category

Application Category

📝 Abstract

Realistic temporal dynamics are crucial for many video generation, processing and modelling applications, e.g. in computational fluid dynamics, weather prediction, or long-term climate simulations. Video diffusion models (VDMs) are the current state-of-the-art method for generating highly realistic dynamics. However, training VDMs from scratch can be challenging and requires large computational resources, limiting their wider application. Here, we propose a time-consistency discriminator that enables pretrained image diffusion models to generate realistic spatiotemporal dynamics. The discriminator guides the sampling inference process and does not require extensions or finetuning of the image diffusion model. We compare our approach against a VDM trained from scratch on an idealized turbulence simulation and a real-world global precipitation dataset. Our approach performs equally well in terms of temporal consistency, shows improved uncertainty calibration and lower biases compared to the VDM, and achieves stable centennial-scale climate simulations at daily time steps.

Problem

Research questions and friction points this paper is trying to address.

Generating realistic temporal dynamics without training video diffusion models from scratch

Improving temporal consistency and uncertainty calibration in spatiotemporal simulations

Enabling stable long-term climate simulations using pretrained image diffusion models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses discriminator-guided image diffusion models

No extensions or finetuning required

Achieves stable long-term climate simulations

🔎 Similar Papers

No similar papers found.