Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and degraded edge-region accuracy of Vision Transformers (ViTs) in semantic segmentation, this paper proposes a plug-and-play, retraining-free progressive token pruning framework. Methodologically, it introduces a novel dual-clustering mechanism guided by low-level features—integrating structural cues (e.g., edges) with high-level semantics—and proposes a multi-scale Tsallis entropy-based dynamic weighting strategy for token importance scoring. This approach overcomes the limitations of conventional single-parameter entropy models and explicitly incorporates edge sensitivity into token importance assessment for the first time. Evaluated on multiple benchmarks, the method reduces FLOPs by 20–45% while incurring less than 0.3% mIoU degradation; notably, it achieves significantly superior edge-region segmentation accuracy compared to existing pruning methods.

Technology Category

Application Category

📝 Abstract
Vision Transformers (ViTs) excel in semantic segmentation but demand significant computation, posing challenges for deployment on resource-constrained devices. Existing token pruning methods often overlook fundamental visual data characteristics. This study introduces 'LVTP', a progressive token pruning framework guided by multi-scale Tsallis entropy and low-level visual features with twice clustering. It integrates high-level semantics and basic visual attributes for precise segmentation. A novel dynamic scoring mechanism using multi-scale Tsallis entropy weighting overcomes limitations of traditional single-parameter entropy. The framework also incorporates low-level feature analysis to preserve critical edge information while optimizing computational cost. As a plug-and-play module, it requires no architectural changes or additional training. Evaluations across multiple datasets show 20%-45% computational reductions with negligible performance loss, outperforming existing methods in balancing cost and accuracy, especially in complex edge regions.
Problem

Research questions and friction points this paper is trying to address.

Reduces computation in Vision Transformers for resource-constrained devices
Integrates low-level visual features for precise semantic segmentation
Overcomes limitations of traditional entropy in token pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive token pruning with multi-scale Tsallis entropy
Low-level visual features guided twice clustering
Dynamic scoring mechanism for edge preservation
🔎 Similar Papers
No similar papers found.
Y
Yuanbing Ouyang
Xidian University, No. 266, Xinglong Section, Xifeng Road, Xi’An, 710126 , Shannxi, China
Y
Yizhuo Liang
Xidian University, No. 266, Xinglong Section, Xifeng Road, Xi’An, 710126 , Shannxi, China
Q
Qingpeng Li
Xidian University, No. 266, Xinglong Section, Xifeng Road, Xi’An, 710126 , Shannxi, China
Xinfei Guo
Xinfei Guo
Shanghai Jiao Tong University
VLSIEDAReliabilityLow PowerMicroarchitecture
Yiming Luo
Yiming Luo
PhD student, The University of Hong Kong
Robotics
D
Di Wu
Norwegian University of Science and Technology, Larsgaardsvegen 2, Aalesund, 6009, Norway
H
Hao Wang
Xidian University, No. 266, Xinglong Section, Xifeng Road, Xi’An, 710126 , Shannxi, China
Yushan Pan
Yushan Pan
Xi'an Jiaotong - Liverpool University/University of Liverpool
Machine LearningAffective ComputingMan-Machine InteractionRobotics