Beyond Visual Cues: Semantic-Driven Token Filtering and Expert Routing for Anytime Person ReID

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the performance degradation of person re-identification under challenging scenarios such as day-night modality shifts and long-term clothing changes, which stem from overreliance on volatile visual features. To mitigate this issue, the paper proposes a novel approach leveraging large vision-language models. Specifically, it introduces instruction-guided generation of identity-consistent semantic text, which is, for the first time, employed to drive semantic-aware visual token filtering (SVTF) and semantic expert routing (SER). This strategy enables the construction of identity representations robust to variations in illumination and attire. The method achieves state-of-the-art performance on the AT-USTC dataset and demonstrates strong generalization and competitive results across five mainstream re-identification benchmarks.

Technology Category

Application Category

📝 Abstract

Any-Time Person Re-identification (AT-ReID) necessitates the robust retrieval of target individuals under arbitrary conditions, encompassing both modality shifts (daytime and nighttime) and extensive clothing-change scenarios, ranging from short-term to long-term intervals. However, existing methods are highly relying on pure visual features, which are prone to change due to environmental and time factors, resulting in significantly performance deterioration under scenarios involving illumination caused modality shifts or cloth-change. In this paper, we propose Semantic-driven Token Filtering and Expert Routing (STFER), a novel framework that leverages the ability of Large Vision-Language Models (LVLMs) to generate identity consistency text, which provides identity-discriminative features that are robust to both clothing variations and cross-modality shifts between RGB and IR. Specifically, we employ instructions to guide the LVLM in generating identity-intrinsic semantic text that captures biometric constants for the semantic model driven. The text token is further used for Semantic-driven Visual Token Filtering (SVTF), which enhances informative visual regions and suppresses redundant background noise. Meanwhile, the text token is also used for Semantic-driven Expert Routing (SER), which integrates the semantic text into expert routing, resulting in more robust multi-scenario gating. Extensive experiments on the Any-Time ReID dataset (AT-USTC) demonstrate that our model achieves state-of-the-art results. Moreover, the model trained on AT-USTC was evaluated across 5 widely-used ReID benchmarks demonstrating superior generalization capabilities with highly competitive results. Our code will be available soon.

Problem

Research questions and friction points this paper is trying to address.

Any-Time Person Re-identification

modality shifts

clothing-change

visual features

identity consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Driven Token Filtering

Expert Routing

Large Vision-Language Models