🤖 AI Summary
Addressing the long-term person re-identification (ReID) challenge under unconstrained conditions—characterized by significant viewpoint, distance, imaging condition, and clothing variations—this paper proposes ECHO-BID. Methodologically, it employs the object-aware pre-trained EVA-02 Large backbone to enhance temporal robustness of identity representations; introduces an implicit object reasoning mechanism to explicitly model identity consistency amid clothing changes; and identifies that small yet highly challenging transfer datasets yield superior generalization, systematically validating the critical impact of backbone architecture and transfer protocols on performance. Evaluated on multiple standard long-term ReID benchmarks, ECHO-BID achieves substantial improvements over state-of-the-art methods, particularly under severe occlusion and large-scale clothing variation scenarios.
📝 Abstract
Person identification in unconstrained viewing environments presents significant challenges due to variations in distance, viewpoint, imaging conditions, and clothing. We introduce $ extbf{E}$va $ extbf{C}$lothes-Change from $ extbf{H}$idden $ extbf{O}$bjects - $ extbf{B}$ody $ extbf{ID}$entification (ECHO-BID), a class of long-term re-id models built on object-pretrained EVA-02 Large backbones. We compare ECHO-BID to 9 other models that vary systematically in backbone architecture, model size, scale of object classification pretraining, and transfer learning protocol. Models were evaluated on benchmark datasets across constrained, unconstrained, and occluded settings. ECHO-BID, with transfer learning on the most challenging clothes-change data, achieved state-of-the-art results on long-term re-id -- substantially outperforming other methods. ECHO-BID also surpassed other methods by a wide margin in occluded viewing scenarios. A combination of increased model size and Masked Image Modeling during pretraining underlie ECHO-BID's strong performance on long-term re-id. Notably, a smaller, but more challenging transfer learning dataset, generalized better across datasets than a larger, less challenging one. However, the larger dataset with an additional fine-tuning step proved best on the most difficult data. Selecting the correct pretrained backbone architecture and transfer learning protocols can drive substantial gains in long-term re-id performance.