🤖 AI Summary
This study addresses the limitations of traditional chimpanzee population estimation, which relies on time-consuming and subjective manual measurements of individual distances from camera trap footage. For the first time, it systematically evaluates the feasibility of monocular depth estimation (MDE) techniques for inferring population density using real-world wild chimpanzee data. The authors propose an end-to-end automated pipeline that integrates animal detection with state-of-the-art MDE models—specifically Dense Prediction Transformers (DPT) and Depth Anything—and incorporates a distance-sampling statistical framework to automatically estimate detection distances. Experimental results demonstrate that a calibrated DPT model outperforms Depth Anything in both distance estimation and population inference, achieving population estimates within 22% deviation from manual methods. This approach substantially reduces reliance on human annotation, highlighting the practical potential of MDE for wildlife monitoring.
📝 Abstract
The estimation of abundance and density in unmarked populations of great apes relies on statistical frameworks that require animal-to-camera distance measurements. In practice, acquiring these distances depends on labour-intensive manual interpretation of animal observations across large camera trap video corpora. This study introduces and evaluates an only sparsely explored alternative: the integration of computer vision-based monocular depth estimation (MDE) pipelines directly into ecological camera trap workflows for great ape conservation. Using a real-world dataset of 220 camera trap videos documenting a wild chimpanzee population, we combine two MDE models, Dense Prediction Transformers and Depth Anything, with multiple distance sampling strategies. These components are used to generate detection distance estimates, from which population density and abundance are inferred. Comparative analysis against manually derived ground-truth distances shows that calibrated DPT consistently outperforms Depth Anything. This advantage is observed in both distance estimation accuracy and downstream density and abundance inference. Nevertheless, both models exhibit systematic biases. We show that, given complex forest environments, they tend to overestimate detection distances and consequently underestimate density and abundance relative to conventional manual approaches. We further find that failures in animal detection across distance ranges are a primary factor limiting estimation accuracy. Overall, this work provides a case study that shows MDE-driven camera trap distance sampling is a viable and practical alternative to manual distance estimation. The proposed approach yields population estimates within 22% of those obtained using traditional methods.