🤖 AI Summary
Existing cross-modal person re-identification (Re-ID) datasets are limited to ground-level views, hindering their applicability to all-day, cross-platform scenarios—particularly aerial-to-ground settings. Method: We introduce AG-VPReID.VIR, the first aerial-ground cross-modal video person Re-ID dataset, comprising 1,837 identities and 4,861 trajectories, supporting day/night operation, cross-view matching, cross-modal alignment, and dynamic spatiotemporal association. We further propose TCC-VPReID, a three-stream network integrating style-robust feature learning, memory-augmented cross-view adaptation, and temporal mediator-guided modeling to jointly address platform heterogeneity and modality discrepancy. Contribution/Results: Extensive experiments under multiple evaluation protocols demonstrate significant performance gains over state-of-the-art methods, validating both the dataset’s inherent challenge and the model’s effectiveness. AG-VPReID.VIR establishes a new benchmark for all-weather, cross-perspective intelligent perception and provides a principled technical pathway toward robust aerial-ground Re-ID.
📝 Abstract
Person re-identification (Re-ID) across visible and infrared modalities is crucial for 24-hour surveillance systems, but existing datasets primarily focus on ground-level perspectives. While ground-based IR systems offer nighttime capabilities, they suffer from occlusions, limited coverage, and vulnerability to obstructions--problems that aerial perspectives uniquely solve. To address these limitations, we introduce AG-VPReID.VIR, the first aerial-ground cross-modality video-based person Re-ID dataset. This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV-mounted and fixed CCTV cameras in RGB and infrared modalities. AG-VPReID.VIR presents unique challenges including cross-viewpoint variations, modality discrepancies, and temporal dynamics. Additionally, we propose TCC-VPReID, a novel three-stream architecture designed to address the joint challenges of cross-platform and cross-modality person Re-ID. Our approach bridges the domain gaps between aerial-ground perspectives and RGB-IR modalities, through style-robust feature learning, memory-based cross-view adaptation, and intermediary-guided temporal modeling. Experiments show that AG-VPReID.VIR presents distinctive challenges compared to existing datasets, with our TCC-VPReID framework achieving significant performance gains across multiple evaluation protocols. Dataset and code are available at https://github.com/agvpreid25/AG-VPReID.VIR.