🤖 AI Summary
This work addresses the challenging cross-view person re-identification (ReID) problem between unmanned aerial vehicles (UAVs) operating at high altitudes (80–120 m) and ground-level CCTV cameras. To tackle extreme viewpoint discrepancies, severe scale variations, and heavy occlusion, we introduce UAV-GroundVid—the first large-scale video-based aerial-ground ReID benchmark—comprising 3,027 identities and 13,500 trajectories. Methodologically, we propose X-TFCLIP, a multi-stream network that fuses video inputs from UAVs, ground CCTV, and wearable cameras; incorporates Transformers to model temporal dynamics; and jointly leverages CLIP-based semantic alignment with physics-informed constraints—including perspective projection and scale priors—to guide feature learning. Evaluated on UAV→ground and ground→UAV settings, X-TFCLIP achieves Rank-1 accuracies of 72.28% and 70.77%, respectively—substantially outperforming existing baselines. Our results demonstrate the effectiveness of integrating temporal modeling with physical awareness for cross-view ReID.
📝 Abstract
Person re-identification (ReID) across aerial and ground vantage points has become crucial for large-scale surveillance and public safety applications. Although significant progress has been made in ground-only scenarios, bridging the aerial-ground domain gap remains a formidable challenge due to extreme viewpoint differences, scale variations, and occlusions. Building upon the achievements of the AG-ReID 2023 Challenge, this paper introduces the AG-VPReID 2025 Challenge - the first large-scale video-based competition focused on high-altitude (80-120m) aerial-ground ReID. Constructed on the new AG-VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras, the challenge featured four international teams. These teams developed solutions ranging from multi-stream architectures to transformer-based temporal reasoning and physics-informed modeling. The leading approach, X-TFCLIP from UAM, attained 72.28% Rank-1 accuracy in the aerial-to-ground ReID setting and 70.77% in the ground-to-aerial ReID setting, surpassing existing baselines while highlighting the dataset's complexity. For additional details, please refer to the official website at https://agvpreid25.github.io.