WildlifeReID-10k: Wildlife re-identification dataset with 10k individual animals

📅 2024-06-13

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Wildlife re-identification (ReID) lacks large-scale, standardized benchmarks free from data leakage, hindering fair and robust evaluation. Method: We introduce WildlifeReID-10k—the first standardized benchmark comprising 33 species, 10,000+ individuals, and 140,000+ images, aggregated from 37 existing datasets. We propose a time-aware and similarity-aware split protocol to eliminate visual leakage between training and test sets, supporting both closed-set and open-set evaluation. Leveraging cross-dataset fusion, semantic-similarity-driven partitioning, and a customized evaluation framework, we integrate strong baselines (ResNet and TransReID). Contribution/Results: The benchmark is publicly released on Kaggle. Experiments expose critical performance bottlenecks of state-of-the-art methods under cross-species generalization, pose variation, and occlusion. WildlifeReID-10k significantly enhances the robustness, fairness, and reproducibility of wildlife ReID evaluation.

Technology Category

Application Category

📝 Abstract

This paper introduces WildlifeReID-10k, a new large-scale re-identification benchmark with more than 10k animal identities of around 33 species across more than 140k images, re-sampled from 37 existing datasets. WildlifeReID-10k covers diverse animal species and poses significant challenges for SoTA methods, ensuring fair and robust evaluation through its time-aware and similarity-aware split protocol. The latter is designed to address the common issue of training-to-test data leakage caused by visually similar images appearing in both training and test sets. The WildlifeReID-10k dataset and benchmark are publicly available on Kaggle, along with strong baselines for both closed-set and open-set evaluation, enabling fair, transparent, and standardized evaluation of not just multi-species animal re-identification models.

Problem

Research questions and friction points this paper is trying to address.

Introduces WildlifeReID-10k dataset for animal re-identification

Addresses data leakage in training-to-test sets

Enables fair evaluation of multi-species re-identification models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with 10k animal identities

Time-aware and similarity-aware split protocol

Public benchmark for fair model evaluation

🔎 Similar Papers

No similar papers found.