MSGL-Transformer: A Multi-Scale Global-Local Transformer for Rodent Social Behavior Recognition

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study addresses the challenges of time-consuming and error-prone manual annotation in rodent social behavior recognition by proposing a lightweight multi-scale global–local Transformer model. The method explicitly captures behavioral dynamics across multiple temporal scales through parallel short-range, mid-range, and global attention branches, and incorporates a Behavior-Aware Modulation (BAM) module to enhance discriminative feature representation. As the first approach to achieve cross-dataset generalization within a unified architecture without task-specific fine-tuning, the model attains 75.4% accuracy (F1 = 0.745) on RatSI and 87.1% accuracy (F1 = 0.8745) on CalMS21, significantly outperforming prevailing methods such as TCN, LSTM, and ST-GCN.

Technology Category

Application Category

📝 Abstract

Recognition of rodent behavior is important for understanding neural and behavioral mechanisms. Traditional manual scoring is time-consuming and prone to human error. We propose MSGL-Transformer, a Multi-Scale Global-Local Transformer for recognizing rodent social behaviors from pose-based temporal sequences. The model employs a lightweight transformer encoder with multi-scale attention to capture motion dynamics across different temporal scales. The architecture integrates parallel short-range, medium-range, and global attention branches to explicitly capture behavior dynamics at multiple temporal scales. We also introduce a Behavior-Aware Modulation (BAM) block, inspired by SE-Networks, which modulates temporal embeddings to emphasize behavior-relevant features prior to attention. We evaluate on two datasets: RatSI (5 behavior classes, 12D pose inputs) and CalMS21 (4 behavior classes, 28D pose inputs). On RatSI, MSGL-Transformer achieves 75.4% mean accuracy and F1-score of 0.745 across nine cross-validation splits, outperforming TCN, LSTM, and Bi-LSTM. On CalMS21, it achieves 87.1% accuracy and F1-score of 0.8745, a +10.7% improvement over HSTWFormer, and outperforms ST-GCN, MS-G3D, CTR-GCN, and STGAT. The same architecture generalizes across both datasets with only input dimensionality and number of classes adjusted.

Problem

Research questions and friction points this paper is trying to address.

rodent social behavior recognition

pose-based temporal sequences

behavior dynamics

multi-scale modeling

automated behavior analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Scale Attention

Global-Local Transformer

Behavior-Aware Modulation