GSAlign: Geometric and Semantic Alignment Network for Aerial-Ground Person Re-Identification

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Air-ground person re-identification (AG-ReID) suffers from severe cross-domain spatial and semantic misalignment due to extreme viewpoint discrepancies, drastic pose variations, and heavy occlusion. To address this, we propose a geometric and semantic joint alignment framework. First, we design a learnable thin-plate spline (LTPS) module that performs keypoint-driven feature deformation correction. Second, we introduce a dynamic alignment module (DAM) integrated with visibility-aware masks to achieve fine-grained semantic alignment. Our method achieves new state-of-the-art performance on all four protocols of the CARGO benchmark, improving mAP and Rank-1 accuracy by 18.8% and 16.8%, respectively. To the best of our knowledge, this is the first work to systematically unify geometric deformation modeling with visibility-aware semantic alignment within a single AG-ReID framework, effectively mitigating cross-view feature mismatch.

Technology Category

Application Category

📝 Abstract

Aerial-Ground person re-identification (AG-ReID) is an emerging yet challenging task that aims to match pedestrian images captured from drastically different viewpoints, typically from unmanned aerial vehicles (UAVs) and ground-based surveillance cameras. The task poses significant challenges due to extreme viewpoint discrepancies, occlusions, and domain gaps between aerial and ground imagery. While prior works have made progress by learning cross-view representations, they remain limited in handling severe pose variations and spatial misalignment. To address these issues, we propose a Geometric and Semantic Alignment Network (GSAlign) tailored for AG-ReID. GSAlign introduces two key components to jointly tackle geometric distortion and semantic misalignment in aerial-ground matching: a Learnable Thin Plate Spline (LTPS) Module and a Dynamic Alignment Module (DAM). The LTPS module adaptively warps pedestrian features based on a set of learned keypoints, effectively compensating for geometric variations caused by extreme viewpoint changes. In parallel, the DAM estimates visibility-aware representation masks that highlight visible body regions at the semantic level, thereby alleviating the negative impact of occlusions and partial observations in cross-view correspondence. A comprehensive evaluation on CARGO with four matching protocols demonstrates the effectiveness of GSAlign, achieving significant improvements of +18.8% in mAP and +16.8% in Rank-1 accuracy over previous state-of-the-art methods on the aerial-ground setting. The code is available at: extcolor{magenta}{https://github.com/stone96123/GSAlign}.

Problem

Research questions and friction points this paper is trying to address.

Matching pedestrian images between aerial drones and ground cameras

Addressing extreme viewpoint discrepancies and spatial misalignment

Handling geometric distortions and semantic misalignment in cross-view matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

LTPS module warps features for geometric variations

DAM estimates visibility-aware masks for semantics

GSAlign jointly tackles geometric and semantic misalignment

🔎 Similar Papers

No similar papers found.