🤖 AI Summary
Existing approaches to animal and human behavior analysis are limited by handcrafted features and the absence of a universal action representation, resulting in poor generalization. This work proposes a Universal Action Space (UAS) that, for the first time, leverages human action semantics as a visual dictionary. By employing deep neural networks trained on large-scale human action data, the method learns high-level action embeddings and transfers them to behavior recognition tasks in mammals and chimpanzees. Experiments demonstrate that this approach achieves efficient and unified behavioral classification across multiple animal behavior datasets, confirming the cross-species generalizability and practical utility of UAS.
📝 Abstract
Analyzing animal and human behavior has long been a challenging task in computer vision. Early approaches from the 1970s to the 1990s relied on hand-crafted edge detection, segmentation, and low-level features such as color, shape, and texture to locate objects and infer their identities-an inherently ill-posed problem. Behavior analysis in this era typically proceeded by tracking identified objects over time and modeling their trajectories using sparse feature points, which further limited robustness and generalization. A major shift occurred with the introduction of ImageNet by Deng and Li in 2010, which enabled large-scale visual recognition through deep neural networks and effectively served as a comprehensive visual dictionary. This development allowed object recognition to move beyond complex low-level processing toward learned high-level representations. In this work, we follow this paradigm to build a large-scale Universal Action Space (UAS) using existing labeled human-action datasets. We then use this UAS as the foundation for analyzing and categorizing mammalian and chimpanzee behavior datasets. The source code is released on GitHub at https://github.com/franktpmvu/Universal-Action-Space.