Harnessing Weak Pair Uncertainty for Text-based Person Search

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This work addresses the limitation of existing text–person image matching methods, which rely exclusively on strict one-to-one positive pairs and overlook weakly aligned positive samples arising from multi-view annotations. To tackle this issue, the authors propose an uncertainty-aware group-level matching framework that explicitly models the uncertainty inherent in weak positive pairs and adaptively adjusts loss weights accordingly. Furthermore, a group-level image–text matching loss is introduced to effectively leverage information from weak positives while avoiding the erroneous repulsion of potentially valid matches. The proposed approach achieves significant performance gains, improving mean average precision (mAP) by 3.06%, 3.55%, and 6.94% on the CUHK-PEDES, RSTPReid, and ICFG-PEDES benchmarks, respectively, outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
In this paper, we study the text-based person search, which is to retrieve the person of interest via natural language description. Prevailing methods usually focus on the strict one-to-one correspondence pair matching between the visual and textual modality, such as contrastive learning. However, such a paradigm unintentionally disregards the weak positive image-text pairs, which are of the same person but the text descriptions are annotated from different views (cameras). To take full use of weak positives, we introduce an uncertainty-aware method to explicitly estimate image-text pair uncertainty, and incorporate the uncertainty into the optimization procedure in a smooth manner. Specifically, our method contains two modules: uncertainty estimation and uncertainty regularization. (1) Uncertainty estimation is to obtain the relative confidence on the given positive pairs; (2) Based on the predicted uncertainty, we propose the uncertainty regularization to adaptively adjust loss weight. Additionally, we introduce a group-wise image-text matching loss to further facilitate the representation space among the weak pairs. Compared with existing methods, the proposed method explicitly prevents the model from pushing away potentially weak positive candidates. Extensive experiments on three widely-used datasets, .e.g, CUHK-PEDES, RSTPReid and ICFG-PEDES, verify the mAP improvement of our method against existing competitive methods +3.06%, +3.55% and +6.94%, respectively.
Problem

Research questions and friction points this paper is trying to address.

text-based person search
weak positive pairs
image-text matching
pair uncertainty
cross-modal retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

weak positive pairs
uncertainty estimation
uncertainty regularization
text-based person search
group-wise matching