AG-VPReID 2025: Aerial-Ground Video-based Person Re-identification Challenge Results

📅 2025-06-28

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenging cross-view person re-identification (ReID) problem between unmanned aerial vehicles (UAVs) operating at high altitudes (80–120 m) and ground-level CCTV cameras. To tackle extreme viewpoint discrepancies, severe scale variations, and heavy occlusion, we introduce UAV-GroundVid—the first large-scale video-based aerial-ground ReID benchmark—comprising 3,027 identities and 13,500 trajectories. Methodologically, we propose X-TFCLIP, a multi-stream network that fuses video inputs from UAVs, ground CCTV, and wearable cameras; incorporates Transformers to model temporal dynamics; and jointly leverages CLIP-based semantic alignment with physics-informed constraints—including perspective projection and scale priors—to guide feature learning. Evaluated on UAV→ground and ground→UAV settings, X-TFCLIP achieves Rank-1 accuracies of 72.28% and 70.77%, respectively—substantially outperforming existing baselines. Our results demonstrate the effectiveness of integrating temporal modeling with physical awareness for cross-view ReID.

Technology Category

Application Category

📝 Abstract

Person re-identification (ReID) across aerial and ground vantage points has become crucial for large-scale surveillance and public safety applications. Although significant progress has been made in ground-only scenarios, bridging the aerial-ground domain gap remains a formidable challenge due to extreme viewpoint differences, scale variations, and occlusions. Building upon the achievements of the AG-ReID 2023 Challenge, this paper introduces the AG-VPReID 2025 Challenge - the first large-scale video-based competition focused on high-altitude (80-120m) aerial-ground ReID. Constructed on the new AG-VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras, the challenge featured four international teams. These teams developed solutions ranging from multi-stream architectures to transformer-based temporal reasoning and physics-informed modeling. The leading approach, X-TFCLIP from UAM, attained 72.28% Rank-1 accuracy in the aerial-to-ground ReID setting and 70.77% in the ground-to-aerial ReID setting, surpassing existing baselines while highlighting the dataset's complexity. For additional details, please refer to the official website at https://agvpreid25.github.io.

Problem

Research questions and friction points this paper is trying to address.

Bridging aerial-ground domain gap in person ReID

Addressing viewpoint differences and scale variations

Improving accuracy in high-altitude aerial-ground ReID

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stream architectures for aerial-ground ReID

Transformer-based temporal reasoning techniques

Physics-informed modeling to address viewpoint differences

🔎 Similar Papers

No similar papers found.