🤖 AI Summary
This work addresses the challenge of label instability in wildlife classification from camera trap videos, where environmental noise often causes inconsistent predictions across frames. The study proposes the first systematic integration of standard multi-object tracking (MOT) into the post-processing pipeline for this domain. By associating detections across frames to form animal trajectories and aggregating per-frame softmax category probabilities along these tracks, the method produces temporally consistent consensus labels without retraining the underlying classifier. Evaluated on three benchmark datasets, the approach significantly enhances classification robustness, yielding absolute improvements of 5.1%, 3.1%, and 2.0% in weighted F1 scores, respectively. These results demonstrate the effectiveness of enforcing temporal consistency as a means to suppress noise-induced prediction errors in wildlife video analysis.
📝 Abstract
Camera traps have become a common tool for wildlife monitoring efforts in ecological research and biodiversity conservation. Wildlife classification models have benefited from the increase in wildlife visual data. These models reach high levels of accuracy on curated, high-quality datasets. However, their performance remains sensitive to real-world environmental constraints. They often produce inconsistent predictions when performing inference on temporally coherent sequences. The predicted label for a single individual shifts rapidly between frames. This study exploits the temporal nature of camera-trap data to augment inferred predictions from a wildlife classification model. Specifically, we adopt several standard Multi-Object Tracking (MOT) models to link detections across consecutive frames. The curated trajectories are used to fuse the softmax class probabilities. The fused probability score produces a single consensus class label estimate that overrides misclassifications caused by noise. The analysis of the experimental results shows that our proposed strategy improves over a standalone classifier over all datasets and for each metric. Specifically, the best-performing MOT models gain a weighted F1-Score of 5.1%, 3.1% and 2.0% over the classifier across three MOT datasets.