Hierarchical Identity Learning for Unsupervised Visible-Infrared Person Re-Identification

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In unsupervised visible-infrared person re-identification (USVI-ReID), large modality gaps and coarse single-center clustering—ignoring intra-cluster fine-grained identity variations—hinder performance. To address these challenges, this paper proposes a hierarchical identity learning framework. It introduces multi-center contrastive learning, where secondary clustering constructs multiple memory units to capture intra-identity diversity; designs a bidirectional pseudo-label backward selection transfer strategy to enhance cross-modal sample matching reliability; and incorporates a modality-aware feature alignment mechanism to strengthen modality invariance. Evaluated on SYSU-MM01 and RegDB, our method significantly outperforms existing unsupervised approaches, achieving improvements of 3.2–5.7% in Rank-1 accuracy and mAP. These results validate the effectiveness of fine-grained identity modeling and robust cross-modal alignment for USVI-ReID.

Technology Category

Application Category

📝 Abstract
Unsupervised visible-infrared person re-identification (USVI-ReID) aims to learn modality-invariant image features from unlabeled cross-modal person datasets by reducing the modality gap while minimizing reliance on costly manual annotations. Existing methods typically address USVI-ReID using cluster-based contrastive learning, which represents a person by a single cluster center. However, they primarily focus on the commonality of images within each cluster while neglecting the finer-grained differences among them. To address the limitation, we propose a Hierarchical Identity Learning (HIL) framework. Since each cluster may contain several smaller sub-clusters that reflect fine-grained variations among images, we generate multiple memories for each existing coarse-grained cluster via a secondary clustering. Additionally, we propose Multi-Center Contrastive Learning (MCCL) to refine representations for enhancing intra-modal clustering and minimizing cross-modal discrepancies. To further improve cross-modal matching quality, we design a Bidirectional Reverse Selection Transmission (BRST) mechanism, which establishes reliable cross-modal correspondences by performing bidirectional matching of pseudo-labels. Extensive experiments conducted on the SYSU-MM01 and RegDB datasets demonstrate that the proposed method outperforms existing approaches. The source code is available at: https://github.com/haonanshi0125/HIL.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised cross-modal person re-identification without manual annotations
Reducing modality gap between visible and infrared images
Addressing fine-grained variations within clustered identity representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Identity Learning framework
Multi-Center Contrastive Learning method
Bidirectional Reverse Selection Transmission mechanism
🔎 Similar Papers
No similar papers found.
Haonan Shi
Haonan Shi
Case Western Reserve University
Machine learningPrivacy and Security
Y
Yubin Wang
Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
De Cheng
De Cheng
Associate Professor, Xidian University
Computer VisionDeep LearningMachine LearningData Compression
L
Lingfeng He
State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, Shanxi, P. R. China
Nannan Wang
Nannan Wang
Professor, Xidian University
Computer VisionMachine LearningPattern Recognition
X
Xinbo Gao
State Key Laboratory of Integrated Services Networks, School of Telecommunications Engineering, Xidian University, Xi’an 710071, Shanxi, P. R. China