🤖 AI Summary
This work addresses the limitation of conventional deep learning models, which rely solely on activation values for information propagation and thereby overlook the joint dynamics of firing rates and phases inherent in neural activity, constraining their capacity for structured understanding and coordination. To overcome this, the study introduces Kuramoto oscillators into the Vision Transformer architecture for the first time, proposing a Kuramoto Phase Encoding (KoPE) mechanism that enhances attention through dynamic phase synchronization. While preserving architectural simplicity, this approach substantially improves performance across diverse tasks—including semantic and panoptic segmentation, language-aligned representation learning, and few-shot abstract visual reasoning on ARC-AGI—and simultaneously achieves synergistic gains in training efficiency, parameter efficiency, and data efficiency.
📝 Abstract
Spatiotemporal neural dynamics and oscillatory synchronization are widely implicated in biological information processing and have been hypothesized to support flexible coordination such as feature binding. By contrast, most deep learning architectures represent and propagate information through activation values, neglecting the joint dynamics of rate and phase. In this work, we introduce Kuramoto oscillatory Phase Encoding (KoPE) as an additional, evolving phase state to Vision Transformers, incorporating a neuro-inspired synchronization mechanism to advance learning efficiency. We show that KoPE can improve training, parameter, and data efficiency of vision models through synchronization-enhanced structure learning. Moreover, KoPE benefits tasks requiring structured understanding, including semantic and panoptic segmentation, representation alignment with language, and few-shot abstract visual reasoning (ARC-AGI). Theoretical analysis and empirical verification further suggest that KoPE can accelerate attention concentration for learning efficiency. These results indicate that synchronization can serve as a scalable, neuro-inspired mechanism for advancing state-of-the-art neural network models.