🤖 AI Summary
Existing remote sensing image generation methods struggle to simulate future land surface changes under specified scenarios, limiting their applicability in urban planning and land management. To address this, we propose the first multimodal controllable spatiotemporal diffusion model tailored for remote sensing, which formalizes the diffusion process as a spatiotemporal bridge linking pre- and post-event land surface evolution. Our approach innovatively incorporates a Brownian bridge stochastic mechanism to jointly integrate heterogeneous spatial control signals—including textual descriptions, instance layouts, and semantic maps—for controllable future scene synthesis. Extensive experiments demonstrate that our method generates high-fidelity images with strong conditional alignment, accurately capturing event-driven land surface transformations. Both quantitative metrics (e.g., FID, LPIPS, SSIM) and qualitative assessments confirm significant improvements over state-of-the-art baselines.
📝 Abstract
Recent advancements in generative methods, especially diffusion models, have made great progress in remote sensing image synthesis. Despite these advancements, existing methods have not explored the simulation of future scenarios based on given scenario images. This simulation capability has wide applications for urban planning, land managementChangeBridge: Spatiotemporal Image Generation with Multimodal Controls, and beyond. In this work, we propose ChangeBridge, a conditional spatiotemporal diffusion model. Given pre-event images and conditioned on multimodal spatial controls (e.g., text prompts, instance layouts, and semantic maps), ChangeBridge can synthesize post-event images. The core idea behind ChangeBridge is to modeling the noise-to-image diffusion model, as a pre-to-post diffusion bridge. Conditioned on multimodal controls, ChangeBridge leverages a stochastic Brownian-bridge diffusion, directly modeling the spatiotemporal evolution between pre-event and post-event states. To the best of our knowledge, ChangeBridge is the first spatiotemporal generative model with multimodal controls for remote sensing. Experimental results demonstrate that ChangeBridge can simulate high-fidelity future scenarios aligned with given conditions, including event and event-driven background variations. Code will be available.