π€ AI Summary
This study addresses the challenge of individual detection and identification of penguins in animal facilities, where high visual similarity, variable postures, and water surface reflections severely hinder performance. To overcome these issues, the authors propose a unified detectionβre-identification framework that jointly leverages appearance and motion cues. In the detection stage, multi-frame YOLOv11 is employed to enhance temporal consistency and improve robustness under occlusion. For re-identification, a tracklet-level contrastive learning strategy is introduced to effectively mitigate identity switches. Experimental results demonstrate that the proposed approach increases detection mAP@0.5 from 0.922 to 0.933 and successfully distinguishes individuals indistinguishable in static images, with feature embeddings exhibiting well-separated cluster structures.
π Abstract
In animal facilities, continuous surveillance of penguins is essential yet technically challenging due to their homogeneous visual characteristics, rapid and frequent posture changes, and substantial environmental noise such as water reflections. In this study, we propose a framework that enhances both detection and identification performance by integrating appearance and motion features. For detection, we adapted YOLO11 to process consecutive frames to overcome the lack of temporal consistency in single-frame detectors. This approach leverages motion cues to detect targets even when distinct visual features are obscured. Our evaluation shows that fine-tuning the model with two-frame inputs improves mAP@0.5 from 0.922 to 0.933, outperforming the baseline, and successfully recovers individuals that are indistinguishable in static images. For identification, we introduce a tracklet-based contrastive learning approach applied after tracking. Through qualitative visualization, we demonstrate that the method produces coherent feature embeddings, bringing samples from the same individual closer in the feature space, suggesting the potential for mitigating ID switching.