🤖 AI Summary
To address insufficient and weakly discriminative feature selection in video-based person re-identification (re-ID), this paper proposes the Hierarchical Adaptive Biometric Expert Mixture (HABE) framework. HABE emulates human multimodal perception by introducing, for the first time, a multi-expert collaborative architecture coupled with a dual-input decision gating mechanism, enabling dynamic selection and hierarchical modeling of appearance, body shape, and gait features from query-gallery video pairs. Leveraging frozen CLIP-extracted multi-granularity features, HABE employs specialized expert networks to model long-term, short-term, and temporal patterns separately, with adaptive fusion guided by the gating mechanism. Evaluated on benchmarks including MEVID, HABE achieves up to a 13.0% improvement in Rank-1 accuracy, demonstrating significantly enhanced robustness and generalization capability under complex, real-world scenarios.
📝 Abstract
Recently, research interest in person re-identification (ReID) has increasingly focused on video-based scenarios, which are essential for robust surveillance and security in varied and dynamic environments. However, existing video-based ReID methods often overlook the necessity of identifying and selecting the most discriminative features from both videos in a query-gallery pair for effective matching. To address this issue, we propose a novel Hierarchical and Adaptive Mixture of Biometric Experts (HAMoBE) framework, which leverages multi-layer features from a pre-trained large model (e.g., CLIP) and is designed to mimic human perceptual mechanisms by independently modeling key biometric features--appearance, static body shape, and dynamic gait--and adaptively integrating them. Specifically, HAMoBE includes two levels: the first level extracts low-level features from multi-layer representations provided by the frozen large model, while the second level consists of specialized experts focusing on long-term, short-term, and temporal features. To ensure robust matching, we introduce a new dual-input decision gating network that dynamically adjusts the contributions of each expert based on their relevance to the input scenarios. Extensive evaluations on benchmarks like MEVID demonstrate that our approach yields significant performance improvements (e.g., +13.0% Rank-1 accuracy).