🤖 AI Summary
This paper addresses two key challenges in cross-spectral (visible/infrared) person re-identification: inter-domain matching difficulty and distance-induced cross-spectral occlusion. We propose a vision Transformer-based transfer learning framework. Our core contributions are: (1) Side Information Embedding (SIE), which implicitly encodes camera source identity—bypassing explicit spectral-domain modeling—and significantly enhances cross-domain robustness; and (2) the first systematic analysis and quantification of distance-induced cross-spectral occlusion, leading to the construction of IJB-MDF, the first benchmark for range-aware, cross-spectral person re-identification. Experiments demonstrate that encoding only camera identity via SIE surpasses state-of-the-art spectrum-aware methods, achieving new SOTA on LLCM. Moreover, SIE exhibits superior generalization under occlusion.
📝 Abstract
Vision Transformers (ViTs) have demonstrated impressive performance across a wide range of biometric tasks, including face and body recognition. In this work, we adapt a ViT model pretrained on visible (VIS) imagery to the challenging problem of cross-spectral body recognition, which involves matching images captured in the visible and infrared (IR) domains. Recent ViT architectures have explored incorporating additional embeddings beyond traditional positional embeddings. Building on this idea, we integrate Side Information Embedding (SIE) and examine the impact of encoding domain and camera information to enhance cross-spectral matching. Surprisingly, our results show that encoding only camera information - without explicitly incorporating domain information - achieves state-of-the-art performance on the LLCM dataset. While occlusion handling has been extensively studied in visible-spectrum person re-identification (Re-ID), occlusions in visible-infrared (VI) Re-ID remain largely underexplored - primarily because existing VI-ReID datasets, such as LLCM, SYSU-MM01, and RegDB, predominantly feature full-body, unoccluded images. To address this gap, we analyze the impact of range-induced occlusions using the IARPA Janus Benchmark Multi-Domain Face (IJB-MDF) dataset, which provides a diverse set of visible and infrared images captured at various distances, enabling cross-range, cross-spectral evaluations.