From Statics to Dynamics: Physics-Aware Image Editing with Latent Transition Priors

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image editing methods often produce physically implausible results by neglecting dynamic physical processes such as refraction and deformation. To address this, this work reframes physically aware image editing as a temporal prediction problem of physical states and introduces PhysicEdit, an end-to-end framework that incorporates a novel physical state transition prior. The authors construct PhysicTran38K, a large-scale video dataset comprising 38K physical trajectories, and propose a learnable transition query mechanism for temporally adaptive editing. Built upon the frozen Qwen2.5-VL vision-language model for physical reasoning, PhysicEdit integrates a diffusion backbone with a two-stage constraint-aware annotation pipeline. Experiments demonstrate that PhysicEdit surpasses Qwen-Image-Edit by 5.9% in physical plausibility and achieves a 10.1% improvement in knowledge-guided editing, establishing a new state of the art among open-source methods and matching the performance of leading closed-source models.

Technology Category

Application Category

📝 Abstract
Instruction-based image editing has achieved remarkable success in semantic alignment, yet state-of-the-art models frequently fail to render physically plausible results when editing involves complex causal dynamics, such as refraction or material deformation. We attribute this limitation to the dominant paradigm that treats editing as a discrete mapping between image pairs, which provides only boundary conditions and leaves transition dynamics underspecified. To address this, we reformulate physics-aware editing as predictive physical state transitions and introduce PhysicTran38K, a large-scale video-based dataset comprising 38K transition trajectories across five physical domains, constructed via a two-stage filtering and constraint-aware annotation pipeline. Building on this supervision, we propose PhysicEdit, an end-to-end framework equipped with a textual-visual dual-thinking mechanism. It combines a frozen Qwen2.5-VL for physically grounded reasoning with learnable transition queries that provide timestep-adaptive visual guidance to a diffusion backbone. Experiments show that PhysicEdit improves over Qwen-Image-Edit by 5.9% in physical realism and 10.1% in knowledge-grounded editing, setting a new state-of-the-art for open-source methods, while remaining competitive with leading proprietary models.
Problem

Research questions and friction points this paper is trying to address.

physics-aware image editing
causal dynamics
physical plausibility
image editing
state transitions
Innovation

Methods, ideas, or system contributions that make the work stand out.

physics-aware editing
latent transition priors
dynamic state modeling
diffusion-based image editing
video-based trajectory dataset
🔎 Similar Papers
No similar papers found.