Temporal Differential Fields for 4D Motion Modeling via Image-to-Video Synthesis

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods for 4D respiratory motion modeling rely on high-dose dual-frame acquisitions (start/end frames), yet preoperative low-dose single-frame scans suffer from dynamic background interference within the respiratory cycle—interference that conventional image registration fails to eliminate, leading to inaccurate temporal modeling. To address this, we propose the first single-image-to-video (I2V) synthesis framework tailored for 4D respiratory motion modeling. Its core innovation is a temporal differential field modeling mechanism: a temporal differential diffusion model explicitly captures inter-frame relative motion, while integrated prompt-attention and field-enhancement layers enable deep coupling between motion priors and the generative process. Trained end-to-end on ACDC and 4D Lung datasets, the generated 4D videos evolve strictly along ground-truth motion trajectories, achieving state-of-the-art performance in perceptual quality and temporal consistency metrics.

Technology Category

Application Category

📝 Abstract
Temporal modeling on regular respiration-induced motions is crucial to image-guided clinical applications. Existing methods cannot simulate temporal motions unless high-dose imaging scans including starting and ending frames exist simultaneously. However, in the preoperative data acquisition stage, the slight movement of patients may result in dynamic backgrounds between the first and last frames in a respiratory period. This additional deviation can hardly be removed by image registration, thus affecting the temporal modeling. To address that limitation, we pioneeringly simulate the regular motion process via the image-to-video (I2V) synthesis framework, which animates with the first frame to forecast future frames of a given length. Besides, to promote the temporal consistency of animated videos, we devise the Temporal Differential Diffusion Model to generate temporal differential fields, which measure the relative differential representations between adjacent frames. The prompt attention layer is devised for fine-grained differential fields, and the field augmented layer is adopted to better interact these fields with the I2V framework, promoting more accurate temporal variation of synthesized videos. Extensive results on ACDC cardiac and 4D Lung datasets reveal that our approach simulates 4D videos along the intrinsic motion trajectory, rivaling other competitive methods on perceptual similarity and temporal consistency. Codes will be available soon.
Problem

Research questions and friction points this paper is trying to address.

Modeling regular respiratory motion for clinical imaging
Overcoming patient movement artifacts in temporal modeling
Generating consistent 4D videos from single-frame inputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Image-to-video synthesis for motion simulation
Temporal Differential Diffusion Model for consistency
Prompt attention for fine-grained differential fields
🔎 Similar Papers
No similar papers found.
Xin You
Xin You
Beihang University
Performance Tool、HPC
M
Minghui Zhang
Institute of Medical Robotics, Shanghai Jiao Tong University; Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University
Hanxiao Zhang
Hanxiao Zhang
Nanjing University
J
Jie Yang
Institute of Medical Robotics, Shanghai Jiao Tong University; Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University
Nassir Navab
Nassir Navab
Professor of Computer Science, Technische Universität München