Animal Pose Labeling Using General-Purpose Point Trackers

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing animal pose estimation methods suffer from poor generalization and high annotation costs due to insufficient training data coverage and substantial morphological diversity across species. To address this, we propose a lightweight test-time optimization framework for animal behavior analysis: given only 3–5 sparsely annotated frames per video—either manually labeled or generated by a detector—we fine-tune lightweight appearance embeddings of a generic point tracker (e.g., TrackAnything or RAFT) to propagate high-accuracy pose estimates across the entire video. This work is the first to adapt generic point trackers to animal pose annotation, circumventing reliance on large-scale labeled datasets through test-time adaptation. Evaluated on multi-species video benchmarks, our method achieves state-of-the-art performance while reducing annotation cost by over 80%, significantly improving cross-morphology generalization and practical deployment efficiency.

Technology Category

Application Category

📝 Abstract
Automatically estimating animal poses from videos is important for studying animal behaviors. Existing methods do not perform reliably since they are trained on datasets that are not comprehensive enough to capture all necessary animal behaviors. However, it is very challenging to collect such datasets due to the large variations in animal morphology. In this paper, we propose an animal pose labeling pipeline that follows a different strategy, i.e. test time optimization. Given a video, we fine-tune a lightweight appearance embedding inside a pre-trained general-purpose point tracker on a sparse set of annotated frames. These annotations can be obtained from human labelers or off-the-shelf pose detectors. The fine-tuned model is then applied to the rest of the frames for automatic labeling. Our method achieves state-of-the-art performance at a reasonable annotation cost. We believe our pipeline offers a valuable tool for the automatic quantification of animal behavior. Visit our project webpage at https://zhuoyang-pan.github.io/animal-labeling.
Problem

Research questions and friction points this paper is trying to address.

Automatically estimating animal poses from videos
Overcoming limitations of existing methods with test-time optimization
Reducing annotation cost while improving pose labeling accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses general-purpose point trackers
Fine-tunes lightweight appearance embedding
Applies test-time optimization strategy
🔎 Similar Papers
No similar papers found.