QuarterMap: Efficient Post-Training Token Pruning for Visual State Space Models

📅 2025-07-13

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

To address inherent spatial redundancy in four-directional scanning of visual state space models (e.g., VMamba), this work proposes a post-training, retraining-free token pruning method. Prior to scanning, redundant tokens are identified and removed based on activation maps; spatial resolution is then efficiently restored via nearest-neighbor upsampling—eliminating the merge-unmerge overhead required by conventional token merging. This is the first plug-and-play, transferable activation-based pruning scheme specifically designed for four-directional scanning SSM architectures. Evaluated on ImageNet-1K, it achieves up to 11% inference speedup with <0.9% top-1 accuracy drop. The method demonstrates robust performance across diverse downstream tasks—including ADE20K semantic segmentation and multiple medical imaging benchmarks (e.g., MedMamba)—and consistently outperforms general-purpose compression methods such as ToMe.

Technology Category

Application Category

📝 Abstract

State space models (SSMs) reduce the quadratic complexity of transformers by leveraging linear recurrence. Recently, VMamba has emerged as a strong SSM-based vision backbone, yet remains bottlenecked by spatial redundancy in its four-directional scan. We propose QuarterMap, a post-training activation pruning method that removes redundant spatial activations before scanning and restores dimensions via nearest-neighbor upsampling. Our method improves throughput without retraining. On ImageNet-1K, QuarterMap achieves up to 11% speedup on VMamba with less than 0.9% accuracy drop, and yields similar gains on ADE20K segmentation. Beyond VMamba, we validate QuarterMap on MedMamba, a domain-specific model that shares the same four-directional scanning structure, where it consistently improves throughput while preserving accuracy across multiple medical imaging tasks. Compared to token merging methods like ToMe, QuarterMap is tailored for SSMs and avoids costly merge-unmerge operations. Our method offers a plug-and-play tool for deployment-time efficiency without compromising transferability.

Problem

Research questions and friction points this paper is trying to address.

Reduces spatial redundancy in VMamba's four-directional scan

Improves throughput without retraining visual state space models

Preserves accuracy while pruning redundant activations in SSMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training token pruning for SSMs

Nearest-neighbor upsampling restores dimensions

Plug-and-play tool boosts deployment efficiency

🔎 Similar Papers

SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference