Rotation Equivariant Mamba for Vision Tasks

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the sensitivity of existing Vision Mamba models to image rotation—a limitation stemming from their neglect of rotational symmetry, which hampers generalization. To overcome this, we propose EQ-VMamba, the first rotation-equivariant Vision Mamba model, which integrates a rotation-equivariant cross-scan strategy and group-based Mamba blocks to achieve strict end-to-end rotational equivariance for the first time in visual Mamba architectures. We further provide a theoretical analysis of equivariance error. Extensive experiments demonstrate that EQ-VMamba matches or surpasses current baselines across image classification, semantic segmentation, and super-resolution tasks, while reducing parameter count by approximately 50%, thereby significantly enhancing both rotational robustness and parameter efficiency.

Technology Category

Application Category

📝 Abstract

Rotation equivariance constitutes one of the most general and crucial structural priors for visual data, yet it remains notably absent from current Mamba-based vision architectures. Despite the success of Mamba in natural language processing and its growing adoption in computer vision, existing visual Mamba models fail to account for rotational symmetry in their design. This omission renders them inherently sensitive to image rotations, thereby constraining their robustness and cross-task generalization. To address this limitation, we propose to incorporate rotation symmetry, a universal and fundamental geometric prior in images, into Mamba-based architectures. Specifically, we introduce EQ-VMamba, the first rotation equivariant visual Mamba architecture for vision tasks. The core components of EQ-VMamba include a carefully designed rotation equivariant cross-scan strategy and group Mamba blocks. Moreover, we provide a rigorous theoretical analysis of the intrinsic equivariance error, demonstrating that the proposed architecture enforces end-to-end rotation equivariance throughout the network. Extensive experiments across multiple benchmarks - including high-level image classification task, mid-level semantic segmentation task, and low-level image super-resolution task - demonstrate that EQ-VMamba achieves superior or competitive performance compared to non-equivariant baselines, while requiring approximately 50% fewer parameters. These results indicate that embedding rotation equivariance not only effectively bolsters the robustness of visual Mamba models against rotation transformations, but also enhances overall performance with significantly improved parameter efficiency. Code is available at https://github.com/zhongchenzhao/EQ-VMamba.

Problem

Research questions and friction points this paper is trying to address.

rotation equivariance

visual Mamba

rotational symmetry

robustness

cross-task generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

rotation equivariance

visual Mamba

cross-scan strategy