A Multi-Drone Multi-View Dataset and Deep Learning Framework for Pedestrian Detection and Tracking

📅 2025-11-06

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Pedestrian detection and tracking from dynamic UAV perspectives face significant challenges—including large camera pose variations, severe occlusions, and frequent camera failures. Method: We introduce MATRIX, the first large-scale, multi-view pedestrian tracking dataset enabling dynamic deployment, captured synchronously by eight UAVs. We propose a deep learning framework integrating real-time camera calibration and bird’s-eye view (BEV) feature fusion, employing feature-level multi-view alignment and joint BEV-space modeling to enhance occlusion robustness and cross-view consistency. Transfer learning is incorporated to improve generalization, and systematic camera degradation experiments validate fault tolerance. Results: In complex urban environments, the method achieves 90% detection and tracking accuracy and 80% trajectory success rate. Performance degrades gracefully under partial camera failure, confirming practical deployability.

Technology Category

Application Category

📝 Abstract

Multi-drone surveillance systems offer enhanced coverage and robustness for pedestrian tracking, yet existing approaches struggle with dynamic camera positions and complex occlusions. This paper introduces MATRIX (Multi-Aerial TRacking In compleX environments), a comprehensive dataset featuring synchronized footage from eight drones with continuously changing positions, and a novel deep learning framework for multi-view detection and tracking. Unlike existing datasets that rely on static cameras or limited drone coverage, MATRIX provides a challenging scenario with 40 pedestrians and a significant architectural obstruction in an urban environment. Our framework addresses the unique challenges of dynamic drone-based surveillance through real-time camera calibration, feature-based image registration, and multi-view feature fusion in bird's-eye-view (BEV) representation. Experimental results demonstrate that while static camera methods maintain over 90% detection and tracking precision and accuracy metrics in a simplified MATRIX environment without an obstruction, 10 pedestrians and a much smaller observational area, their performance significantly degrades in the complex environment. Our proposed approach maintains robust performance with $sim$90% detection and tracking accuracy, as well as successfully tracks $sim$80% of trajectories under challenging conditions. Transfer learning experiments reveal strong generalization capabilities, with the pretrained model achieving much higher detection and tracking accuracy performance compared to training the model from scratch. Additionally, systematic camera dropout experiments reveal graceful performance degradation, demonstrating practical robustness for real-world deployments where camera failures may occur. The MATRIX dataset and framework provide essential benchmarks for advancing dynamic multi-view surveillance systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses pedestrian detection challenges with dynamic drone cameras and occlusions

Develops multi-view deep learning framework for robust aerial surveillance

Provides benchmark dataset for complex multi-drone tracking in urban environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-drone dataset with dynamic camera positions

Deep learning framework using multi-view feature fusion

Real-time camera calibration and BEV representation

🔎 Similar Papers

No similar papers found.