HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient and weakly discriminative feature selection in video-based person re-identification (re-ID), this paper proposes the Hierarchical Adaptive Biometric Expert Mixture (HABE) framework. HABE emulates human multimodal perception by introducing, for the first time, a multi-expert collaborative architecture coupled with a dual-input decision gating mechanism, enabling dynamic selection and hierarchical modeling of appearance, body shape, and gait features from query-gallery video pairs. Leveraging frozen CLIP-extracted multi-granularity features, HABE employs specialized expert networks to model long-term, short-term, and temporal patterns separately, with adaptive fusion guided by the gating mechanism. Evaluated on benchmarks including MEVID, HABE achieves up to a 13.0% improvement in Rank-1 accuracy, demonstrating significantly enhanced robustness and generalization capability under complex, real-world scenarios.

Technology Category

Application Category

📝 Abstract
Recently, research interest in person re-identification (ReID) has increasingly focused on video-based scenarios, which are essential for robust surveillance and security in varied and dynamic environments. However, existing video-based ReID methods often overlook the necessity of identifying and selecting the most discriminative features from both videos in a query-gallery pair for effective matching. To address this issue, we propose a novel Hierarchical and Adaptive Mixture of Biometric Experts (HAMoBE) framework, which leverages multi-layer features from a pre-trained large model (e.g., CLIP) and is designed to mimic human perceptual mechanisms by independently modeling key biometric features--appearance, static body shape, and dynamic gait--and adaptively integrating them. Specifically, HAMoBE includes two levels: the first level extracts low-level features from multi-layer representations provided by the frozen large model, while the second level consists of specialized experts focusing on long-term, short-term, and temporal features. To ensure robust matching, we introduce a new dual-input decision gating network that dynamically adjusts the contributions of each expert based on their relevance to the input scenarios. Extensive evaluations on benchmarks like MEVID demonstrate that our approach yields significant performance improvements (e.g., +13.0% Rank-1 accuracy).
Problem

Research questions and friction points this paper is trying to address.

Identifying discriminative features in video-based person ReID
Adaptively integrating appearance, body shape, and gait features
Dynamic expert contribution adjustment for robust matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical and Adaptive Mixture of Biometric Experts
Multi-layer features from pre-trained large model
Dual-input decision gating network for dynamic adjustment
🔎 Similar Papers
No similar papers found.
Yiyang Su
Yiyang Su
Michigan State University
Computer Vision
Y
Yunping Shi
Department of Computer Science, Drexel University
F
Feng Liu
Department of Computer Science, Drexel University
X
Xiaoming Liu
Department of Computer Science and Engineering, Michigan State University