The SA-FARI Dataset: Segment Anything in Footage of Animals for Recognition and Identification

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Current wildlife multi-object tracking (MAT) datasets suffer from limited scale, low species diversity, and insufficient spatiotemporal coverage, hindering the development of generalizable models. To address this, we introduce WildTrack-99—the first large-scale, high-diversity wildlife MAT benchmark—comprising 46 hours of camera-trap video spanning 99 species, with over 940,000 bounding box annotations and 16,000+ instance segmentation masks; geographic metadata is anonymized to preserve privacy while enabling cross-regional generalization. This work uniquely unifies species diversity, broad geographic coverage, and high-fidelity spatiotemporal annotation. Leveraging WildTrack-99, we conduct a systematic evaluation of state-of-the-art vision-language models (e.g., SAM 3) and pure vision-based methods across detection, tracking, and individual re-identification tasks. Our benchmark establishes a reproducible foundation for behavioral analysis and population monitoring in ecological conservation.

Technology Category

Application Category

📝 Abstract

Automated video analysis is critical for wildlife conservation. A foundational task in this domain is multi-animal tracking (MAT), which underpins applications such as individual re-identification and behavior recognition. However, existing datasets are limited in scale, constrained to a few species, or lack sufficient temporal and geographical diversity - leaving no suitable benchmark for training general-purpose MAT models applicable across wild animal populations. To address this, we introduce SA-FARI, the largest open-source MAT dataset for wild animals. It comprises 11,609 camera trap videos collected over approximately 10 years (2014-2024) from 741 locations across 4 continents, spanning 99 species categories. Each video is exhaustively annotated culminating in ~46 hours of densely annotated footage containing 16,224 masklet identities and 942,702 individual bounding boxes, segmentation masks, and species labels. Alongside the task-specific annotations, we publish anonymized camera trap locations for each video. Finally, we present comprehensive benchmarks on SA-FARI using state-of-the-art vision-language models for detection and tracking, including SAM 3, evaluated with both species-specific and generic animal prompts. We also compare against vision-only methods developed specifically for wildlife analysis. SA-FARI is the first large-scale dataset to combine high species diversity, multi-region coverage, and high-quality spatio-temporal annotations, offering a new foundation for advancing generalizable multianimal tracking in the wild. The dataset is available at $href{https://www.conservationxlabs.com/sa-fari}{ ext{conservationxlabs.com/SA-FARI}}$.

Problem

Research questions and friction points this paper is trying to address.

Existing datasets lack scale and diversity for wildlife tracking

No suitable benchmark exists for general-purpose multi-animal tracking models

Current datasets are constrained to few species and limited regions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Largest open-source multi-animal tracking dataset

Combines species diversity with spatio-temporal annotations

Benchmarks vision-language models using species-specific prompts

🔎 Similar Papers

No similar papers found.