Generative Spatiotemporal Data Augmentation

📅 2025-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of spatiotemporal data and the inability of conventional and existing generative methods to capture critical distributional dimensions in low-annotation scenarios (e.g., UAV imagery), this paper pioneers the use of video diffusion models for controllable 3D spatiotemporal augmentation—synthesizing photorealistic, multi-view, dynamic video clips from a single static image. We introduce three practical techniques: (i) geometry-semantic co-guided annotation transfer, (ii) disocclusion-aware synthesis, and (iii) data-efficient generation configuration selection with incremental fine-tuning. Evaluated on COCO subsets and a UAV-specific dataset, our approach significantly improves object detection and instance segmentation performance. It effectively expands the modeled data distribution beyond what prior methods can capture, establishing a scalable, generative augmentation paradigm for few-shot vision tasks.

Technology Category

Application Category

📝 Abstract
We explore spatiotemporal data augmentation using video foundation models to diversify both camera viewpoints and scene dynamics. Unlike existing approaches based on simple geometric transforms or appearance perturbations, our method leverages off-the-shelf video diffusion models to generate realistic 3D spatial and temporal variations from a given image dataset. Incorporating these synthesized video clips as supplemental training data yields consistent performance gains in low-data settings, such as UAV-captured imagery where annotations are scarce. Beyond empirical improvements, we provide practical guidelines for (i) choosing an appropriate spatiotemporal generative setup, (ii) transferring annotations to synthetic frames, and (iii) addressing disocclusion - regions newly revealed and unlabeled in generated views. Experiments on COCO subsets and UAV-captured datasets show that, when applied judiciously, spatiotemporal augmentation broadens the data distribution along axes underrepresented by traditional and prior generative methods, offering an effective lever for improving model performance in data-scarce regimes.
Problem

Research questions and friction points this paper is trying to address.

Generating realistic spatiotemporal video variations for data augmentation
Improving model performance in low-data settings like UAV imagery
Providing guidelines for generative setup, annotation transfer, and disocclusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using video diffusion models for spatiotemporal data augmentation
Generating realistic 3D spatial and temporal variations from images
Providing guidelines for annotation transfer and disocclusion handling
🔎 Similar Papers
No similar papers found.
Jinfan Zhou
Jinfan Zhou
University of Chicago
Computer VisionGenerative Models
L
Lixin Luo
University of Michigan, Ann Arbor
S
Sungmin Eum
DEVCOM Army Research Laboratory
H
Heesung Kwon
DEVCOM Army Research Laboratory
Jeong Joon Park
Jeong Joon Park
Assistant Professor, University of Michigan
computer visioncomputer graphicsartificial intelligence