AG-VPReID 2025: Aerial-Ground Video-based Person Re-identification Challenge Results

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging cross-view person re-identification (ReID) problem between unmanned aerial vehicles (UAVs) operating at high altitudes (80–120 m) and ground-level CCTV cameras. To tackle extreme viewpoint discrepancies, severe scale variations, and heavy occlusion, we introduce UAV-GroundVid—the first large-scale video-based aerial-ground ReID benchmark—comprising 3,027 identities and 13,500 trajectories. Methodologically, we propose X-TFCLIP, a multi-stream network that fuses video inputs from UAVs, ground CCTV, and wearable cameras; incorporates Transformers to model temporal dynamics; and jointly leverages CLIP-based semantic alignment with physics-informed constraints—including perspective projection and scale priors—to guide feature learning. Evaluated on UAV→ground and ground→UAV settings, X-TFCLIP achieves Rank-1 accuracies of 72.28% and 70.77%, respectively—substantially outperforming existing baselines. Our results demonstrate the effectiveness of integrating temporal modeling with physical awareness for cross-view ReID.

Technology Category

Application Category

📝 Abstract
Person re-identification (ReID) across aerial and ground vantage points has become crucial for large-scale surveillance and public safety applications. Although significant progress has been made in ground-only scenarios, bridging the aerial-ground domain gap remains a formidable challenge due to extreme viewpoint differences, scale variations, and occlusions. Building upon the achievements of the AG-ReID 2023 Challenge, this paper introduces the AG-VPReID 2025 Challenge - the first large-scale video-based competition focused on high-altitude (80-120m) aerial-ground ReID. Constructed on the new AG-VPReID dataset with 3,027 identities, over 13,500 tracklets, and approximately 3.7 million frames captured from UAVs, CCTV, and wearable cameras, the challenge featured four international teams. These teams developed solutions ranging from multi-stream architectures to transformer-based temporal reasoning and physics-informed modeling. The leading approach, X-TFCLIP from UAM, attained 72.28% Rank-1 accuracy in the aerial-to-ground ReID setting and 70.77% in the ground-to-aerial ReID setting, surpassing existing baselines while highlighting the dataset's complexity. For additional details, please refer to the official website at https://agvpreid25.github.io.
Problem

Research questions and friction points this paper is trying to address.

Bridging aerial-ground domain gap in person ReID
Addressing viewpoint differences and scale variations
Improving accuracy in high-altitude aerial-ground ReID
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stream architectures for aerial-ground ReID
Transformer-based temporal reasoning techniques
Physics-informed modeling to address viewpoint differences
🔎 Similar Papers
No similar papers found.
Kien Nguyen
Kien Nguyen
Institute for Advanced Academic Research & Graduate School of Informatics, Chiba University
IoTnetworkingwirelessnetwork virtualizationSDN
Clinton Fookes
Clinton Fookes
Queensland University of Technology
Computer VisionMachine LearningSignal ProcessingAIVideo Analytics/Biometrics/Medical Imaging
Sridha Sridharan
Sridha Sridharan
Professor
computer visionmachine learningspeaker recognitionbiometricsimage processing
H
Huy Nguyen
Queensland University of Technology, Australia
F
Feng Liu
Drexel University, USA
X
Xiaoming Liu
Michigan State University, USA
Arun Ross
Arun Ross
Professor | Michigan State University
BiometricsComputer VisionPattern RecognitionIris Recognition
D
Dana Michalski
Department of Defence, Australia
T
Tamás Endrei
Universidad Autónoma de Madrid, Spain; Pázmány Péter Catholic University, Hungary
I
Ivan DeAndres-Tame
Universidad Autónoma de Madrid, Spain
Ruben Tolosana
Ruben Tolosana
Associate Professor, Universidad Autonoma de Madrid
Machine LearningPattern RecognitionDeepFakesBiometricsHuman-Computer Interaction
Ruben Vera-Rodriguez
Ruben Vera-Rodriguez
Associate Professor, Universidad Autonoma de Madrid
BiometricsMachine LearningHuman-Computer InteractionBehavioral BiometricsSoft Biometrics
A
Aythami Morales
Universidad Autónoma de Madrid, Spain
J
Julian Fierrez
Universidad Autónoma de Madrid, Spain
Javier Ortega-Garcia
Javier Ortega-Garcia
Professor of Signal Processing, Universidad Autonoma de Madrid - Spain
BiometricsSignal ProcessingPattern Matching
Z
Zijing Gong
Dalian University of Technology, China
Y
Yuhao Wang
Dalian University of Technology, China
Xuehu Liu
Xuehu Liu
武汉理工大学,大连理工大学
P
Pingping Zhang
Dalian University of Technology, China
Md Rashidunnabi
Md Rashidunnabi
PhD Researcher
Computer Vision
H
Hugo Proença
IT: Instituto de Telecomunicacoes, University of Beira Interior, Portugal
K
Kailash A. Hambarde
IT: Instituto de Telecomunicacoes, University of Beira Interior, Portugal
S
Saeid Rezaei
University College Cork, Ireland