AG-VPReID.VIR: Bridging Aerial and Ground Platforms for Video-based Visible-Infrared Person Re-ID

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing cross-modal person re-identification (Re-ID) datasets are limited to ground-level views, hindering their applicability to all-day, cross-platform scenarios—particularly aerial-to-ground settings. Method: We introduce AG-VPReID.VIR, the first aerial-ground cross-modal video person Re-ID dataset, comprising 1,837 identities and 4,861 trajectories, supporting day/night operation, cross-view matching, cross-modal alignment, and dynamic spatiotemporal association. We further propose TCC-VPReID, a three-stream network integrating style-robust feature learning, memory-augmented cross-view adaptation, and temporal mediator-guided modeling to jointly address platform heterogeneity and modality discrepancy. Contribution/Results: Extensive experiments under multiple evaluation protocols demonstrate significant performance gains over state-of-the-art methods, validating both the dataset’s inherent challenge and the model’s effectiveness. AG-VPReID.VIR establishes a new benchmark for all-weather, cross-perspective intelligent perception and provides a principled technical pathway toward robust aerial-ground Re-ID.

Technology Category

Application Category

📝 Abstract

Person re-identification (Re-ID) across visible and infrared modalities is crucial for 24-hour surveillance systems, but existing datasets primarily focus on ground-level perspectives. While ground-based IR systems offer nighttime capabilities, they suffer from occlusions, limited coverage, and vulnerability to obstructions--problems that aerial perspectives uniquely solve. To address these limitations, we introduce AG-VPReID.VIR, the first aerial-ground cross-modality video-based person Re-ID dataset. This dataset captures 1,837 identities across 4,861 tracklets (124,855 frames) using both UAV-mounted and fixed CCTV cameras in RGB and infrared modalities. AG-VPReID.VIR presents unique challenges including cross-viewpoint variations, modality discrepancies, and temporal dynamics. Additionally, we propose TCC-VPReID, a novel three-stream architecture designed to address the joint challenges of cross-platform and cross-modality person Re-ID. Our approach bridges the domain gaps between aerial-ground perspectives and RGB-IR modalities, through style-robust feature learning, memory-based cross-view adaptation, and intermediary-guided temporal modeling. Experiments show that AG-VPReID.VIR presents distinctive challenges compared to existing datasets, with our TCC-VPReID framework achieving significant performance gains across multiple evaluation protocols. Dataset and code are available at https://github.com/agvpreid25/AG-VPReID.VIR.

Problem

Research questions and friction points this paper is trying to address.

Bridging aerial and ground platforms for cross-modality person Re-ID

Addressing occlusion and coverage limits in ground-based IR surveillance

Solving cross-viewpoint and modality discrepancies in video Re-ID

Innovation

Methods, ideas, or system contributions that make the work stand out.

First aerial-ground cross-modality Re-ID dataset

Three-stream architecture for cross-platform Re-ID

Style-robust feature learning and temporal modeling

🔎 Similar Papers

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification