Causal Bootstrapped Alignment for Unsupervised Video-Based Visible-Infrared Person Re-Identification

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the challenges of unreliable pseudo-labels and cross-modal misalignment in unsupervised video-based visible-infrared person re-identification, which stem from modality bias and identity confusion. To tackle these issues, the authors propose a Causal Bootstrapping Alignment (CBA) framework that uniquely integrates causal intervention with prototype-guided uncertainty refinement. Specifically, Causal Intervention Warm-up (CIW) suppresses spurious cross-modal correlations, while Prototype-Guided Uncertainty Refinement (PGUR) enables coarse-to-fine cross-modal alignment, effectively mitigating imbalanced clustering granularity. Extensive experiments on the HITSZ-VCM and BUPTCampus benchmarks demonstrate that the proposed method significantly outperforms existing unsupervised approaches, validating its effectiveness for all-weather video-based cross-modal person re-identification.

Technology Category

Application Category

📝 Abstract
VVI-ReID is a critical technique for all-day surveillance, where temporal information provides additional cues beyond static images. However, existing approaches rely heavily on fully supervised learning with expensive cross-modality annotations, limiting scalability. To address this issue, we investigate Unsupervised Learning for VVI-ReID (USL-VVI-ReID), which learns identity-discriminative representations directly from unlabeled video tracklets. Directly extending image-based USL-VI-ReID methods to this setting with generic pretrained encoders leads to suboptimal performance. Such encoders suffer from weak identity discrimination and strong modality bias, resulting in severe intra-modality identity confusion and pronounced clustering granularity imbalance between visible and infrared modalities. These issues jointly degrade pseudo-label reliability and hinder effective cross-modality alignment. To address these challenges, we propose a Causal Bootstrapped Alignment (CBA) framework that explicitly exploits inherent video priors. First, we introduce Causal Intervention Warm-up (CIW), which performs sequence-level causal interventions by leveraging temporal identity consistency and cross-modality identity consistency to suppress modality- and motion-induced spurious correlations while preserving identity-relevant semantics, yielding cleaner representations for unsupervised clustering. Second, we propose Prototype-Guided Uncertainty Refinement (PGUR), which employs a coarse-to-fine alignment strategy to resolve cross-modality granularity mismatch, reorganizing under-clustered infrared representations under the guidance of reliable visible prototypes with uncertainty-aware supervision. Extensive experiments on the HITSZ-VCM and BUPTCampus benchmarks demonstrate that CBA significantly outperforms existing USL-VI-ReID methods when extended to the USL-VVI-ReID setting.
Problem

Research questions and friction points this paper is trying to address.

Visible-Infrared Person Re-Identification
Unsupervised Learning
Modality Bias
Pseudo-label Reliability
Cross-modality Alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Intervention
Unsupervised Video ReID
Cross-Modality Alignment
Prototype Guidance
Temporal Consistency
🔎 Similar Papers
No similar papers found.
S
Shuang Li
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
Jiaxu Leng
Jiaxu Leng
Chongqing University of Posts and Telecommunications
Computer Vision
C
Changjiang Kuang
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
M
Mingpi Tan
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
Yu Yuan
Yu Yuan
University of Science and Technology of China
RobustnessLLM
X
Xinbo Gao
School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China