🤖 AI Summary
This study addresses the challenge of reliable detection and re-identification (Re-ID) of Holstein cattle in densely populated settings, where existing methods suffer from occluded contours and disruptive coat patterns. To overcome these limitations, the authors propose an integrated detection–segmentation–recognition pipeline that uniquely combines open-vocabulary zero-shot localization with the Segment Anything Model (SAM) for preprocessing, followed by an unsupervised contrastive learning framework to construct the Re-ID network. This approach significantly enhances detection robustness and cross-domain generalization in crowded scenarios. Evaluated on a newly curated real-world farm CCTV dataset, the method achieves detection and Re-ID accuracies of 98.93% and 94.82%, respectively, outperforming current baselines by 47.52% and 27.13%.
📝 Abstract
Holstein-Friesian detection and re-identification (Re-ID) methods capture individuals well when targets are spatially separate. However, existing approaches, including YOLO-based species detection, break down when cows group closely together. This is particularly prevalent for species which have outline-breaking coat patterns. To boost both effectiveness and transferability in this setting, we propose a new detect-segment-identify pipeline that leverages the Open-Vocabulary Weight-free Localisation and the Segment Anything models as pre-processing stages alongside Re-ID networks. To evaluate our approach, we publish a collection of nine days CCTV data filmed on a working dairy farm. Our methodology overcomes detection breakdown in dense animal groupings, resulting in a 98.93% accuracy. This significantly outperforms current oriented bounding box-driven, as well as SAM species detection baselines with accuracy improvements of 47.52% and 27.13%, respectively. We show that unsupervised contrastive learning can build on this to yield 94.82% Re-ID accuracy on our test data. Our work demonstrates that Re-ID in crowded scenarios is both practical as well as reliable in working farm settings with no manual intervention. Code and dataset are provided for reproducibility.