🤖 AI Summary
This work addresses the challenge of missing frames in aerial drone videos, particularly in structured maneuvering scenarios involving small, low-texture autonomous surface vessels at sea. The authors propose a domain-agnostic video reconstruction method that introduces trajectory guidance into diffusion models for the first time. By leveraging GPS telemetry data mapped via equirectangular projection to generate motion cues in image space, and conditioning a pretrained image-to-video diffusion model (SG-I2V) with a single reference image, the approach enables high-fidelity video synthesis without requiring domain-specific fine-tuning. Experimental results demonstrate superior performance over optical flow extrapolation and RIFE interpolation baselines, achieving competitive scores in BRISQUE (25.52), temporal smoothness (1.14), and trajectory alignment error (9.31 pixels), closely approximating real video quality.
📝 Abstract
This paper addresses the problem of reconstructing missing or dropped frames in top-down drone video of autonomous surface vehicles performing structured maritime manoeuvres. We propose a pipeline that converts raw GPS telemetry and a single reference frame into a trajectory-guided video sequence using a pre-trained image-to-video diffusion model, requiring no domain-specific fine-tuning. GPS coordinates from onboard telemetry logs are projected into image space via an equirectangular mapping, producing per-vessel motion cues that condition the SG-I2V diffusion model. The generated frames are evaluated against ground-truth video using perceptual, temporal and trajectory-based metrics, and benchmarked against optical flow extrapolation and RIFE interpolation baselines. SG-I2V produces the most naturally appearing frames among all methods (BRISQUE 25.52, closest to ground-truth 23.64), the most realistic motion magnitude (temporal smoothness 1.14 vs. ground truth 1.42), and the strongest GPS trajectory adherence (9.31px vs. 28.70px for ground-truth, the latter reflecting approximate temporal alignment between footage and GPS logs rather than generation error), demonstrating that trajectory-guided diffusion synthesis is a viable approach to maritime video reconstruction under challenging low-texture, small-object conditions.