FSMC-Pose: Frequency and Spatial Fusion with Multiscale Self-calibration for Cattle Mounting Pose Estimation

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of cow mounting pose estimation in real-world scenarios characterized by complex backgrounds and frequent occlusions. To this end, we propose the FSMC-Pose framework, which comprises a frequency–spatial fused backbone network, CattleMountNet—integrating a Spatial-Frequency Enhancement Block (SFEBlock) and a Receptive Field Aggregation Block (RABlock)—along with a multi-scale spatial-channel self-calibration head (SC²Head). This architecture effectively disentangles targets from cluttered backgrounds, captures multi-scale contextual information, and mitigates structural misalignment caused by occlusions. Evaluated on the newly curated MOUNT-Cattle dataset (conforming to COCO format) and jointly trained with NWAFU-Cattle, our method outperforms strong baselines in accuracy while significantly reducing model parameters and computational cost, enabling real-time inference on commercial GPUs.

Technology Category

Application Category

📝 Abstract
Mounting posture is an important visual indicator of estrus in dairy cattle. However, achieving reliable mounting pose estimation in real-world environments remains challenging due to cluttered backgrounds and frequent inter-animal occlusion. We present FSMC-Pose, a top-down framework that integrates a lightweight frequency-spatial fusion backbone, CattleMountNet, and a multiscale self-calibration head, SC2Head. Specifically, we design two algorithmic components for CattleMountNet: the Spatial Frequency Enhancement Block (SFEBlock) and the Receptive Aggregation Block (RABlock). SFEBlock separates cattle from cluttered backgrounds, while RABlock captures multiscale contextual information. The Spatial-Channel Self-Calibration Head (SC2Head) attends to spatial and channel dependencies and introduces a self-calibration branch to mitigate structural misalignment under inter-animal overlap. We construct a mounting dataset, MOUNT-Cattle, covering 1176 mounting instances, which follows the COCO format and supports drop-in training across pose estimation models. Using a comprehensive dataset that combines MOUNT-Cattle with the public NWAFU-Cattle dataset, FSMC-Pose achieves higher accuracy than strong baselines, with markedly lower computational and parameter costs, while maintaining real-time inference on commodity GPUs. Extensive experiments and qualitative analyses show that FSMC-Pose effectively captures and estimates cattle mounting pose in complex and cluttered environments. Dataset and code are available at https://github.com/elianafang/FSMC-Pose.
Problem

Research questions and friction points this paper is trying to address.

cattle mounting pose estimation
cluttered backgrounds
inter-animal occlusion
estrus detection
pose estimation in complex environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency-spatial fusion
multiscale self-calibration
cattle mounting pose estimation
occlusion handling
lightweight pose network
🔎 Similar Papers
No similar papers found.