Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing methods for 4D respiratory motion modeling rely on high-dose dual-frame acquisitions (start/end frames), yet preoperative low-dose single-frame scans suffer from dynamic background interference within the respiratory cycle—interference that conventional image registration fails to eliminate, leading to inaccurate temporal modeling. To address this, we propose the first single-image-to-video (I2V) synthesis framework tailored for 4D respiratory motion modeling. Its core innovation is a temporal differential field modeling mechanism: a temporal differential diffusion model explicitly captures inter-frame relative motion, while integrated prompt-attention and field-enhancement layers enable deep coupling between motion priors and the generative process. Trained end-to-end on ACDC and 4D Lung datasets, the generated 4D videos evolve strictly along ground-truth motion trajectories, achieving state-of-the-art performance in perceptual quality and temporal consistency metrics.

Technology Category

Application Category

📝 Abstract

Temporal modeling on regular respiration-induced motions is crucial to image-guided clinical applications. Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously. However, in the preoperative data acquisition stage, the slight movement of patients may result in dynamic backgrounds between the first and last frames in a respiratory period. This additional deviation can hardly be removed by image registration, thus affecting the temporal modeling. To address that limitation, we pioneeringly simulate the regular motion process via the image-to-video (I2V) synthesis framework, which animates with the first frame to forecast future frames of a given length. Besides, to promote the temporal consistency of animated videos, we devise the Temporal Differential Diffusion Model to generate temporal differential fields, which measure the relative differential representations between adjacent frames. The prompt attention layer is devised for fine-grained differential fields, and the field augmented layer is adopted to better interact these fields with the I2V framework, promoting more accurate temporal variation of synthesized videos. Extensive results on ACDC cardiac and 4D Lung datasets reveal that our approach simulates 4D videos along the intrinsic motion trajectory, rivaling other competitive methods on perceptual similarity and temporal consistency. Codes will be available soon.

Problem

Research questions and friction points this paper is trying to address.

Modeling regular respiratory motion for clinical imaging

Overcoming patient movement artifacts in temporal modeling

Generating consistent 4D videos from single-frame inputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Image-to-video synthesis for motion simulation

Temporal Differential Diffusion Model for consistency

Prompt attention for fine-grained differential fields

🔎 Similar Papers

SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency