Diversifying Human Pose in Synthetic Data for Aerial-view Human Detection

📅 2024-05-24

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address the challenges of insufficient pose diversity, low photorealism, and high construction cost in few-shot aerial human detection, this paper proposes an unpaired progressive pose transfer framework. Methodologically, it first synthesizes diverse human poses; then constructs a graph structure based on semantic similarity and employs Dijkstra’s algorithm to derive controllable, temporally coherent pose evolution paths; finally, it performs style-preserving conditional image translation to iteratively inject neighboring pose sequences into real-world aerial backgrounds, achieving high-fidelity pose augmentation. Crucially, the method requires no paired pose images or additional synthetic data. Evaluated on three aerial benchmarks—VisDrone, Okutama-Action, and ICG—the approach significantly improves few-shot detection accuracy, demonstrating both effectiveness and cross-dataset generalizability.

Technology Category

Application Category

📝 Abstract

We present a framework for diversifying human poses in a synthetic dataset for aerial-view human detection. Our method firstly constructs a set of novel poses using a pose generator and then alters images in the existing synthetic dataset to assume the novel poses while maintaining the original style using an image translator. Since images corresponding to the novel poses are not available in training, the image translator is trained to be applicable only when the input and target poses are similar, thus training does not require the novel poses and their corresponding images. Next, we select a sequence of target novel poses from the novel pose set, using Dijkstra's algorithm to ensure that poses closer to each other are located adjacently in the sequence. Finally, we repeatedly apply the image translator to each target pose in sequence to produce a group of novel pose images representing a variety of different limited body movements from the source pose. Experiments demonstrate that, regardless of how the synthetic data is used for training or the data size, leveraging the pose-diversified synthetic dataset in training generally presents remarkably better accuracy than using the original synthetic dataset on three aerial-view human detection benchmarks (VisDrone, Okutama-Action, and ICG) in the few-shot regime.

Problem

Research questions and friction points this paper is trying to address.

Diversifying human poses in synthetic aerial-view datasets

Generating realistic 3D human poses with diffusion models

Improving detection accuracy in low-shot aerial scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based 3D human pose generator

Source-to-target image translator for poses

Pose sequence optimization via Dijkstra's algorithm

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)