Hierarchical Identity Learning for Unsupervised Visible-Infrared Person Re-Identification

📅 2025-09-15

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

In unsupervised visible-infrared person re-identification (USVI-ReID), large modality gaps and coarse single-center clustering—ignoring intra-cluster fine-grained identity variations—hinder performance. To address these challenges, this paper proposes a hierarchical identity learning framework. It introduces multi-center contrastive learning, where secondary clustering constructs multiple memory units to capture intra-identity diversity; designs a bidirectional pseudo-label backward selection transfer strategy to enhance cross-modal sample matching reliability; and incorporates a modality-aware feature alignment mechanism to strengthen modality invariance. Evaluated on SYSU-MM01 and RegDB, our method significantly outperforms existing unsupervised approaches, achieving improvements of 3.2–5.7% in Rank-1 accuracy and mAP. These results validate the effectiveness of fine-grained identity modeling and robust cross-modal alignment for USVI-ReID.

Technology Category

Application Category

📝 Abstract

Unsupervised visible-infrared person re-identification (USVI-ReID) aims to learn modality-invariant image features from unlabeled cross-modal person datasets by reducing the modality gap while minimizing reliance on costly manual annotations. Existing methods typically address USVI-ReID using cluster-based contrastive learning, which represents a person by a single cluster center. However, they primarily focus on the commonality of images within each cluster while neglecting the finer-grained differences among them. To address the limitation, we propose a Hierarchical Identity Learning (HIL) framework. Since each cluster may contain several smaller sub-clusters that reflect fine-grained variations among images, we generate multiple memories for each existing coarse-grained cluster via a secondary clustering. Additionally, we propose Multi-Center Contrastive Learning (MCCL) to refine representations for enhancing intra-modal clustering and minimizing cross-modal discrepancies. To further improve cross-modal matching quality, we design a Bidirectional Reverse Selection Transmission (BRST) mechanism, which establishes reliable cross-modal correspondences by performing bidirectional matching of pseudo-labels. Extensive experiments conducted on the SYSU-MM01 and RegDB datasets demonstrate that the proposed method outperforms existing approaches. The source code is available at: https://github.com/haonanshi0125/HIL.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised cross-modal person re-identification without manual annotations

Reducing modality gap between visible and infrared images

Addressing fine-grained variations within clustered identity representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Identity Learning framework

Multi-Center Contrastive Learning method

Bidirectional Reverse Selection Transmission mechanism

🔎 Similar Papers

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification