🤖 AI Summary
Cross-species mammalian pose estimation faces challenges including non-rigid deformations, occlusions, and scarce annotations—stemming from inter-species variations in appearance, anatomy, and motion patterns. To address these, we propose the Keypoint Interaction Transformer (KIT), which explicitly models anatomical constraints and inter-joint dependencies. Our method integrates structure-aware graph attention, multi-scale feature alignment, and self-supervised keypoint relation distillation, enabling zero-shot generalization to unseen species. Evaluated on a large-scale cross-species benchmark covering 12 mammalian species, KIT achieves an average 8.3% improvement in PCKh and reduces cross-species transfer error by 37% over state-of-the-art general-purpose pose models. Our core contribution is the first end-to-end keypoint interaction modeling framework tailored for cross-species mammalian pose estimation—uniquely balancing structural priors with data efficiency.