🤖 AI Summary
This work addresses the challenge of stable tracking of high-density wild horses in aerial videos, where small target sizes, highly variable poses, and complex backgrounds hinder performance. Traditional axis-aligned bounding boxes struggle to maintain consistent tracking, while existing rotated bounding box methods—limited to 180° angle representations—fail to distinguish head from tail, often causing track fragmentation. To overcome this, the authors propose an oriented bounding box (OBB)-based head orientation estimation approach. By cropping the central region of each OBB and integrating dedicated head, tail, and joint head-tail detectors, the method employs an IoU-based majority voting scheme to resolve frame-to-frame 180° ambiguities and ensure consistent orientation. Evaluated on a test set of 299 images, the approach achieves a 99.3% orientation accuracy, significantly enhancing tracking robustness and continuity in dense animal groups.
📝 Abstract
The social structures of group-living animals such as feral horses are diverse and remain insufficiently understood, even within a single species. To investigate group dynamics, aerial videos are often utilized to track individuals and analyze their movement trajectories, which are essential for evaluating inter-individual interactions and comparing social behaviors. Accurate individual tracking is therefore crucial. In multi-animal tracking, axis-aligned bounding boxes (bboxes) are widely used; however, for aerial top-view footage of entire groups, their performance degrades due to complex backgrounds, small target sizes, high animal density, and varying body orientations. To address this issue, we employ oriented bounding boxes (OBBs), which include rotation angles and reduce unnecessary background. Nevertheless, current OBB detectors such as YOLO-OBB restrict angles within a 180$^{\circ}$ range, making it impossible to distinguish head from tail and often causing sudden 180$^{\circ}$ flips across frames, which severely disrupts continuous tracking. To overcome this limitation, we propose a head-orientation estimation method that crops OBB-centered patches, applies three detectors (head, tail, and head-tail), and determines the final label through IoU-based majority voting. Experiments using 299 test images show that our method achieves 99.3% accuracy, outperforming individual models, demonstrating its effectiveness for robust OBB-based tracking.