Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality labeled data and poor cross-species generalization in animal behavior analysis, this paper introduces BEAST: a self-supervised pretraining framework for neural behavioral videos. Its core innovation lies in jointly leveraging masked image modeling (MIM) and temporal contrastive learning (TCL) to establish an experiment-specific visual Transformer pretraining paradigm. BEAST unifies support for single- and multi-animal pose estimation, fine-grained behavior segmentation, and neural activity–correlated feature extraction. Evaluated across multiple neural behavioral datasets, BEAST consistently outperforms state-of-the-art supervised and self-supervised methods—achieving performance gains of 12.7%–23.4% under low-labeling regimes (<10% annotated data) and demonstrating strong cross-species generalization. This work establishes a scalable, annotation-efficient, and computationally effective video analysis paradigm for neuroscience.

Technology Category

Application Category

📝 Abstract
The brain can only be fully understood through the lens of the behavior it generates -- a guiding principle in modern neuroscience research that nevertheless presents significant technical challenges. Many studies capture behavior with cameras, but video analysis approaches typically rely on specialized models requiring extensive labeled data. We address this limitation with BEAST (BEhavioral Analysis via Self-supervised pretraining of Transformers), a novel and scalable framework that pretrains experiment-specific vision transformers for diverse neuro-behavior analyses. BEAST combines masked autoencoding with temporal contrastive learning to effectively leverage unlabeled video data. Through comprehensive evaluation across multiple species, we demonstrate improved performance in three critical neuro-behavioral tasks: extracting behavioral features that correlate with neural activity, and pose estimation and action segmentation in both the single- and multi-animal settings. Our method establishes a powerful and versatile backbone model that accelerates behavioral analysis in scenarios where labeled data remains scarce.
Problem

Research questions and friction points this paper is trying to address.

Self-supervised pretraining for animal behavior analysis without extensive labeled data
Improving behavioral feature extraction correlated with neural activity
Enhancing pose estimation and action segmentation in single- and multi-animal settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised pretraining of vision transformers
Combines masked autoencoding with contrastive learning
Versatile backbone model for scarce labeled data
🔎 Similar Papers
No similar papers found.
Y
Yanchen Wang
Columbia University, New York
H
Han Yu
Columbia University, New York
A
Ari Blau
Columbia University, New York
Y
Yizi Zhang
Columbia University, New York
T
The International Brain Laboratory
The International Brain Laboratory
Liam Paninski
Liam Paninski
Columbia University
Neural data science
Cole Hurwitz
Cole Hurwitz
Postdoctoral Research Scientist, Zuckerman Institute, Columbia University
Foundation modelsNeural Data AnalysisSpike sorting
Matthew R Whiteway
Matthew R Whiteway
Columbia University
computational neurosciencebehavioral analysismachine learning