🤖 AI Summary
Existing open-source video annotation tools struggle to simultaneously support precise individual localization and fine-grained annotation of social interaction behaviors, hindering both animal social behavior research and visual model training. To address this, we propose SILVI—the first open-source tool enabling joint annotation of individual identities and interactive behaviors. SILVI innovatively integrates behavioral semantic annotation with multi-object tracking, supporting temporal synchronization, dynamic scene graph generation, and explicit interaction relation modeling. Its decoupled frontend-backend architecture produces structured, machine-readable label data compatible with both animal and human video analysis. Open-sourced with comprehensive documentation and modular, extensible design, SILVI significantly improves annotation efficiency for complex social behavior datasets and enhances downstream model training support. By bridging computer vision and behavioral ecology, SILVI facilitates deeper interdisciplinary research and advances the development of socially aware visual understanding systems.
📝 Abstract
Computer vision methods are increasingly used for the automated analysis of large volumes of video data collected through camera traps, drones, or direct observations of animals in the wild. While recent advances have focused primarily on detecting individual actions, much less work has addressed the detection and annotation of interactions -- a crucial aspect for understanding social and individualized animal behavior. Existing open-source annotation tools support either behavioral labeling without localization of individuals, or localization without the capacity to capture interactions. To bridge this gap, we present SILVI, an open-source labeling software that integrates both functionalities. SILVI enables researchers to annotate behaviors and interactions directly within video data, generating structured outputs suitable for training and validating computer vision models. By linking behavioral ecology with computer vision, SILVI facilitates the development of automated approaches for fine-grained behavioral analyses. Although developed primarily in the context of animal behavior, SILVI could be useful more broadly to annotate human interactions in other videos that require extracting dynamic scene graphs. The software, along with documentation and download instructions, is available at: https://gitlab.gwdg.de/kanbertay/interaction-labelling-app.