QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models

📅 2025-07-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inherent spatial redundancy in four-directional scanning of visual state space models (e.g., VMamba), this work proposes a post-training, retraining-free token pruning method. Prior to scanning, redundant tokens are identified and removed based on activation maps; spatial resolution is then efficiently restored via nearest-neighbor upsampling—eliminating the merge-unmerge overhead required by conventional token merging. This is the first plug-and-play, transferable activation-based pruning scheme specifically designed for four-directional scanning SSM architectures. Evaluated on ImageNet-1K, it achieves up to 11% inference speedup with <0.9% top-1 accuracy drop. The method demonstrates robust performance across diverse downstream tasks—including ADE20K semantic segmentation and multiple medical imaging benchmarks (e.g., MedMamba)—and consistently outperforms general-purpose compression methods such as ToMe.

Technology Category

Application Category

📝 Abstract
State space models (SSMs) reduce the quadratic complexity of transformers by leveraging linear recurrence. Recently, VMamba has emerged as a strong SSM-based vision backbone, yet remains bottlenecked by spatial redundancy in its four-directional scan. We propose QuarterMap, a post-training activation pruning method that removes redundant spatial activations before scanning and restores dimensions via nearest-neighbor upsampling. Our method improves throughput without retraining. On ImageNet-1K, QuarterMap achieves up to 11% speedup on VMamba with less than 0.9% accuracy drop, and yields similar gains on ADE20K segmentation. Beyond VMamba, we validate QuarterMap on MedMamba, a domain-specific model that shares the same four-directional scanning structure, where it consistently improves throughput while preserving accuracy across multiple medical imaging tasks. Compared to token merging methods like ToMe, QuarterMap is tailored for SSMs and avoids costly merge-unmerge operations. Our method offers a plug-and-play tool for deployment-time efficiency without compromising transferability.
Problem

Research questions and friction points this paper is trying to address.

Reduces spatial redundancy in VMamba's four-directional scan
Improves throughput without retraining visual state space models
Preserves accuracy while pruning redundant activations in SSMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training token pruning for SSMs
Nearest-neighbor upsampling restores dimensions
Plug-and-play tool boosts deployment efficiency
🔎 Similar Papers
No similar papers found.