🤖 AI Summary
This survey systematically reviews 176 animal pose estimation (APE) studies published between 2011 and 2023, addressing challenges in modeling, benchmarking, and application under multimodal inputs—including RGB, LiDAR, infrared, IMU, acoustic, and language prompts. Methodologically, we propose the first unified multimodal APE taxonomy covering both 2D and 3D formulations; uncover bidirectional technical transfer patterns between human and animal pose estimation; and establish a cross-modal evaluation framework that harmonizes supervised, self-supervised, and weakly supervised paradigms via standardized experimental protocols. As key contributions, we release an open-source multimodal APE benchmark—comprising curated datasets, reproducible codebases, and a continuously updated GitHub repository—designed to support rigorous, reproducible research in neuroscience, biomechanics, and veterinary medicine.
📝 Abstract
Animal pose estimation (APE) aims to locate the animal body parts using a diverse array of sensor and modality inputs (e.g. RGB cameras, LiDAR, infrared, IMU, acoustic and language cues), which is crucial for research across neuroscience, biomechanics, and veterinary medicine. By evaluating 176 papers since 2011, APE methods are categorised by their input sensor and modality types, output forms, learning paradigms, experimental setup, and application domains, presenting detailed analyses of current trends, challenges, and future directions in single- and multi-modality APE systems. The analysis also highlights the transition between human and animal pose estimation, and how innovations in APE can reciprocally enrich human pose estimation and the broader machine learning paradigm. Additionally, 2D and 3D APE datasets and evaluation metrics based on different sensors and modalities are provided. A regularly updated project page is provided here: https://github.com/ChennyDeng/MM-APE.