🤖 AI Summary
To address the high annotation cost and low data efficiency in primate behavior recognition, this paper proposes Domain-Adaptive Pretraining (DAP), a self-supervised continual pretraining method built upon the V-JEPA architecture, trained exclusively on unlabeled primate videos. DAP significantly reduces reliance on labeled data while exhibiting strong domain transfer capability across diverse primate behavior recognition tasks. Evaluated on two benchmark datasets—PanAf and ChimpACT—DAP achieves absolute improvements of 6.1% in accuracy and 6.3% in mean Average Precision (mAP) over state-of-the-art action recognition models. This work represents the first integration of JEPA-style self-supervised learning with primate behavioral analysis, establishing a scalable and transferable general-purpose modeling framework for low-resource animal behavior research.
📝 Abstract
Computer vision for animal behavior offers promising tools to aid research in ecology, cognition, and to support conservation efforts. Video camera traps allow for large-scale data collection, but high labeling costs remain a bottleneck to creating large-scale datasets. We thus need data-efficient learning approaches. In this work, we show that we can utilize self-supervised learning to considerably improve action recognition on primate behavior. On two datasets of great ape behavior (PanAf and ChimpACT), we outperform published state-of-the-art action recognition models by 6.1 %pt. accuracy and 6.3 %pt. mAP, respectively. We achieve this by utilizing a pretrained V-JEPA model and applying domain-adaptive pretraining (DAP), i.e. continuing the pretraining with in-domain data. We show that most of the performance gain stems from the DAP. Our method promises great potential for improving the recognition of animal behavior, as DAP does not require labeled samples. Code is available at https://github.com/ecker-lab/dap-behavior