WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

📅 2024-06-13
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Wildlife re-identification (ReID) lacks large-scale, standardized benchmarks free from data leakage, hindering fair and robust evaluation. Method: We introduce WildlifeReID-10k—the first standardized benchmark comprising 33 species, 10,000+ individuals, and 140,000+ images, aggregated from 37 existing datasets. We propose a time-aware and similarity-aware split protocol to eliminate visual leakage between training and test sets, supporting both closed-set and open-set evaluation. Leveraging cross-dataset fusion, semantic-similarity-driven partitioning, and a customized evaluation framework, we integrate strong baselines (ResNet and TransReID). Contribution/Results: The benchmark is publicly released on Kaggle. Experiments expose critical performance bottlenecks of state-of-the-art methods under cross-species generalization, pose variation, and occlusion. WildlifeReID-10k significantly enhances the robustness, fairness, and reproducibility of wildlife ReID evaluation.

Technology Category

Application Category

📝 Abstract
This paper introduces WildlifeReID-10k, a new large-scale re-identification benchmark with more than 10k animal identities of around 33 species across more than 140k images, re-sampled from 37 existing datasets. WildlifeReID-10k covers diverse animal species and poses significant challenges for SoTA methods, ensuring fair and robust evaluation through its time-aware and similarity-aware split protocol. The latter is designed to address the common issue of training-to-test data leakage caused by visually similar images appearing in both training and test sets. The WildlifeReID-10k dataset and benchmark are publicly available on Kaggle, along with strong baselines for both closed-set and open-set evaluation, enabling fair, transparent, and standardized evaluation of not just multi-species animal re-identification models.
Problem

Research questions and friction points this paper is trying to address.

Introduces WildlifeReID-10k dataset for animal re-identification
Addresses data leakage in training-to-test sets
Enables fair evaluation of multi-species re-identification models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with 10k animal identities
Time-aware and similarity-aware split protocol
Public benchmark for fair model evaluation
🔎 Similar Papers
No similar papers found.
L
L. Adam
University of West Bohemia
V
Vojtěch Čermák
Czech Technical University in Prague
Kostas Papafitsoros
Kostas Papafitsoros
Queen Mary University of London
MathematicsMathematical ImagingVariational MethodsMachine LearningSea Turtles
L
Lukás Picek
University of West Bohemia and INRIA