VReID-XFD: Video-based Person Re-identification at Extreme Far Distance Challenge Results

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the severe performance degradation in cross-view (aerial-to-ground) person re-identification under extreme long-range conditions, caused by drastic resolution loss, significant viewpoint shifts, motion blur, and clothing variations. To this end, we formally define the task for the first time and introduce VReID-XFD, a new video benchmark and challenge built upon the DetReIDX dataset, which encompasses 371 identities and 11,288 trajectories. The benchmark features a rigorously identity-disjoint evaluation protocol, multi-perspective captures from altitudes of 5.8–120 meters with俯角 ranging from 30° to 90°, trajectory-level annotations, and rich physical metadata. The accompanying challenge attracted 10 participating teams, with the top-performing method, SAS-PReID, achieving only 43.93% mAP—highlighting the task’s difficulty and the limitations of current approaches, thereby establishing a foundation for future research.

Technology Category

Application Category

📝 Abstract
Person re-identification (ReID) across aerial and ground views at extreme far distances introduces a distinct operating regime where severe resolution degradation, extreme viewpoint changes, unstable motion cues, and clothing variation jointly undermine the appearance-based assumptions of existing ReID systems. To study this regime, we introduce VReID-XFD, a video-based benchmark and community challenge for extreme far-distance (XFD) aerial-to-ground person re-identification. VReID-XFD is derived from the DetReIDX dataset and comprises 371 identities, 11,288 tracklets, and 11.75 million frames, captured across altitudes from 5.8 m to 120 m, viewing angles from oblique (30 degrees) to nadir (90 degrees), and horizontal distances up to 120 m. The benchmark supports aerial-to-aerial, aerial-to-ground, and ground-to-aerial evaluation under strict identity-disjoint splits, with rich physical metadata. The VReID-XFD-25 Challenge attracted 10 teams with hundreds of submissions. Systematic analysis reveals monotonic performance degradation with altitude and distance, a universal disadvantage of nadir views, and a trade-off between peak performance and robustness. Even the best-performing SAS-PReID method achieves only 43.93 percent mAP in the aerial-to-ground setting. The dataset, annotations, and official evaluation protocols are publicly available at https://www.it.ubi.pt/DetReIDX/ .
Problem

Research questions and friction points this paper is trying to address.

person re-identification
extreme far distance
aerial-to-ground
video-based ReID
viewpoint variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

video-based person re-identification
extreme far distance
aerial-to-ground
benchmark dataset
viewpoint variation
🔎 Similar Papers
No similar papers found.
K
Kailash A. Hambarde
IT - Instituto de Telecomunicac ̧̃oes, Portugal; University of Beira Interior, Portugal
H
Hugo Proencca
IT - Instituto de Telecomunicac ̧̃oes, Portugal; University of Beira Interior, Portugal
M
MD. Rashidunnabi
University of Beira Interior, Portugal
P
Pranita Samale
University of Beira Interior, Portugal
Q
Qiwei Yang
Dalian University of Technology, China
P
Pingping Zhang
Dalian University of Technology, China
Z
Zijing Gong
Dalian University of Technology, China
Yuhao Wang
Yuhao Wang
Dalian University of Technology
Computer VisionMulti-modal FusionReID
X
Xi Zhang
Dalian University of Technology, China
R
Ruoshui Qu
Dalian University of Technology, China
Q
Qiaoyun He
Dalian University of Technology, China
Y
Yuhang Zhang
Dalian University of Technology, China
T
Thi Ngoc Ha Nguyen
University of Information Technology, VNU-HCM, Vietnam
Tien-Dung Mai
Tien-Dung Mai
University of Information Technology - VNUHCM
Cheng-Jun Kang
Cheng-Jun Kang
National Cheng Kung University
computer vision、Deep Learning
Yu-Fan Lin
Yu-Fan Lin
Institute of Data Science, National Cheng Kung University, Taiwan
computer visiondeep learningmultimodal learning
J
Jin-Hui Jiang
National Yang Ming Chiao Tung University, Taiwan
Chih-Chung Hsu
Chih-Chung Hsu
Associate Professor of Institute of Intelligent Systems, College of AI, NYCU
Deep learningImage processingcomputer visionimage compressionvideo editing
T
Tamás Endrei
Pázmány Péter Catholic University, Budapest, Hungary
György Cserey
György Cserey
Professor, Faculty of IT and Bionics, Pázmány Péter Catholic University
AI psychologysensory roboticsmachine learningparallel computingneuro-bio-inspired systems
A
Ashwat Rajbhandari
Arizona State University, Arizona, USA