Partial Ring Scan: Revisiting Scan Order in Vision State Space Models

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing visual state space models (SSMs), which rely on fixed scanning orders that disrupt spatial adjacency and object continuity, leading to significant performance degradation under geometric transformations such as rotation. To overcome this, the authors propose PRISMamba, the first method to systematically analyze the impact of scanning order on visual SSMs. PRISMamba introduces a concentric ring partitioning strategy with order-agnostic aggregation within each ring, complemented by a short radial SSM to propagate contextual information across rings. Additionally, a partial channel filtering mechanism is designed to enhance computational efficiency. The model achieves 84.5% Top-1 accuracy on ImageNet-1K with 3.9G FLOPs and a throughput of 3,054 images per second on an A100 GPU, outperforming VMamba while exhibiting only a 1–2% accuracy drop under rotational perturbations, thus achieving a strong balance among accuracy, efficiency, and geometric robustness.

Technology Category

Application Category

📝 Abstract
State Space Models (SSMs) have emerged as efficient alternatives to attention for vision tasks, offering lineartime sequence processing with competitive accuracy. Vision SSMs, however, require serializing 2D images into 1D token sequences along a predefined scan order, a factor often overlooked. We show that scan order critically affects performance by altering spatial adjacency, fracturing object continuity, and amplifying degradation under geometric transformations such as rotation. We present Partial RIng Scan Mamba (PRISMamba), a rotation-robust traversal that partitions an image into concentric rings, performs order-agnostic aggregation within each ring, and propagates context across rings through a set of short radial SSMs. Efficiency is further improved via partial channel filtering, which routes only the most informative channels through the recurrent ring pathway while keeping the rest on a lightweight residual branch. On ImageNet-1K, PRISMamba achieves 84.5% Top-1 with 3.9G FLOPs and 3,054 img/s on A100, outperforming VMamba in both accuracy and throughput while requiring fewer FLOPs. It also maintains performance under rotation, whereas fixed-path scans drop by 1~2%. These results highlight scan-order design, together with channel filtering, as a crucial, underexplored factor for accuracy, efficiency, and rotation robustness in Vision SSMs. Code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

scan order
Vision State Space Models
rotation robustness
spatial adjacency
geometric transformations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scan Order
State Space Models
Rotation Robustness
Ring-based Aggregation
Partial Channel Filtering
🔎 Similar Papers
No similar papers found.
Y
Yi-Kuan Hsieh
College of Artificial Intelligence and Green Energy, National Yang Ming Chiao Tung University
Jun-Wei Hsieh
Jun-Wei Hsieh
National Yang Ming Chiao Tung University
computer visionAIimage processing
X
Xin li
Computer Science Department, University at Albany, SUNY, NY, USA
M
Ming-Ching Chang
Computer Science Department, University at Albany, SUNY, NY, USA
Yu-Chee Tseng
Yu-Chee Tseng
College of AI, National Yang Ming Chiao Tung University
mobile computingwireless networkartificial intelligence