🤖 AI Summary
Wildlife behavior monitoring is hindered by the scarcity of high-quality, annotated field video data. To address this, we introduce *AlpineWild*, the first multimodal, multi-view video dataset specifically designed for alpine wild mammals—comprising 14 hours of synchronized audiovisual recordings, pixel-level 2D instance segmentation masks, and 8.5 hours of densely annotated individual trajectories. We propose two novel benchmarks: (1) hierarchical multimodal behavior recognition, and (2) ecology-oriented multi-view event recognition—both integrating audio, video, and scene segmentation while explicitly modeling real-world ecological confounders such as false triggers. Leveraging synchronized multi-camera acquisition and long-term individual tracking, we release an open-source dataset containing 6,135 single-animal clips and 397 multi-view ecological events. Experiments demonstrate substantial improvements in野外 behavior classification accuracy and joint ecological variable reasoning. All code and data are publicly available.
📝 Abstract
Monitoring wildlife is essential for ecology and ethology, especially in light of the increasing human impact on ecosystems. Camera traps have emerged as habitat-centric sensors enabling the study of wildlife populations at scale with minimal disturbance. However, the lack of annotated video datasets limits the development of powerful video understanding models needed to process the vast amount of fieldwork data collected. To advance research in wild animal behavior monitoring we present MammAlps, a multimodal and multi-view dataset of wildlife behavior monitoring from 9 camera-traps in the Swiss National Park. MammAlps contains over 14 hours of video with audio, 2D segmentation maps and 8.5 hours of individual tracks densely labeled for species and behavior. Based on 6135 single animal clips, we propose the first hierarchical and multimodal animal behavior recognition benchmark using audio, video and reference scene segmentation maps as inputs. Furthermore, we also propose a second ecology-oriented benchmark aiming at identifying activities, species, number of individuals and meteorological conditions from 397 multi-view and long-term ecological events, including false positive triggers. We advocate that both tasks are complementary and contribute to bridging the gap between machine learning and ecology. Code and data are available at: https://github.com/eceo-epfl/MammAlps