ICON: Invariant Counterfactual Optimization with Neuro-Symbolic Priors for Text-Based Person Search

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text-based person retrieval methods are vulnerable to spurious correlations and spatial-semantic misalignment in open-world scenarios, exhibiting limited robustness to distributional shifts. This work proposes the first retrieval framework that integrates causal and topological priors, effecting a paradigm shift from statistical fitting to causal invariant learning. The approach leverages four key mechanisms: rule-guided spatial intervention, counterfactual context disentanglement, saliency-driven semantic regularization, and neuro-symbolic topological alignment. Evaluated on standard benchmarks, the method achieves state-of-the-art performance and demonstrates exceptional robustness under challenging conditions such as occlusion, background clutter, and localization noise.

Technology Category

Application Category

📝 Abstract
Text-Based Person Search (TBPS) holds unique value in real-world surveillance bridging visual perception and language understanding, yet current paradigms utilizing pre-training models often fail to transfer effectively to complex open-world scenarios. The reliance on"Passive Observation"leads to multifaceted spurious correlations and spatial semantic misalignment, causing a lack of robustness against distribution shifts. To fundamentally resolve these defects, this paper proposes ICON (Invariant Counterfactual Optimization with Neuro-symbolic priors), a framework integrating causal and topological priors. First, we introduce Rule-Guided Spatial Intervention to strictly penalize sensitivity to bounding box noise, forcibly severing location shortcuts to achieve geometric invariance. Second, Counterfactual Context Disentanglement is implemented via semantic-driven background transplantation, compelling the model to ignore background interference for environmental independence. Then, we employ Saliency-Driven Semantic Regularization with adaptive masking to resolve local saliency bias and guarantee holistic completeness. Finally, Neuro-Symbolic Topological Alignment utilizes neuro-symbolic priors to constrain feature matching, ensuring activated regions are topologically consistent with human structural logic. Experimental results demonstrate that ICON not only maintains leading performance on standard benchmarks but also exhibits exceptional robustness against occlusion, background interference, and localization noise. This approach effectively advances the field by shifting from fitting statistical co-occurrences to learning causal invariance.
Problem

Research questions and friction points this paper is trying to address.

Text-Based Person Search
spurious correlations
distribution shifts
robustness
semantic misalignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal Invariance
Neuro-Symbolic Priors
Counterfactual Disentanglement
Spatial Intervention
Topological Alignment
🔎 Similar Papers
No similar papers found.
Xiangyu Wang
Xiangyu Wang
Professor, Curtin University
Civil EngineeringBuilding Information ModelingSmart CityAutomation and RoboticsSmart
Z
Zhixin Lv
College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Y
Yongjiao Sun
College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China; Key Laboratory of Intelligent Computing in Medical Image of Ministry of Education, Northeastern University, Shenyang, 110819, China
A
Anrui Han
College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China
Y
Ye Yuan
School of AI, Beijing Institute of Technology, Beijing, 100081, China
H
Hangxu Ji
College of Computer Science and Engineering, Northeastern University, Shenyang, 110819, China; Foshan Graduate School of Innovation, Northeastern University, Foshan, 528311, China