Progressive Token Length Scaling in Transformer Encoders for Efficient Universal Segmentation

πŸ“… 2024-04-23
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the excessive computational overhead of Transformer encoders in general-purpose image segmentation, this paper proposes PRO-SCALEβ€”a progressive token-length scaling strategy that dynamically and hierarchically reduces the input token sequence length in Mask2Former’s encoder. PRO-SCALE innovatively integrates three components: (i) multi-scale feature-adaptive downsampling, (ii) inter-layer token-length scheduling, and (iii) query-aware token retention. This design achieves substantial efficiency gains without compromising accuracy. On COCO, PRO-SCALE reduces encoder GFLOPs by 52% and overall model GFLOPs by 27%, while maintaining mAP unchanged. Moreover, it demonstrates strong generalization across both segmentation and detection tasks. By enabling scalable, architecture-agnostic token compression, PRO-SCALE establishes a new lightweight paradigm for efficient general-purpose segmentation.

Technology Category

Application Category

πŸ“ Abstract
A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the state-of-the-art method Mask2Former uses 50% of its compute only on the transformer encoder. This is due to the retention of a full-length token-level representation of all backbone feature scales at each encoder layer. With this observation, we propose a strategy termed PROgressive Token Length SCALing for Efficient transformer encoders (PRO-SCALE) that can be plugged-in to the Mask2Former segmentation architecture to significantly reduce the computational cost. The underlying principle of PRO-SCALE is: progressively scale the length of the tokens with the layers of the encoder. This allows PRO-SCALE to reduce computations by a large margin with minimal sacrifice in performance (~52% encoder and ~27% overall GFLOPs reduction with no drop in performance on COCO dataset). Experiments conducted on public benchmarks demonstrates PRO-SCALE's flexibility in architectural configurations, and exhibits potential for extension beyond the settings of segmentation tasks to encompass object detection. Code here: https://github.com/abhishekaich27/proscale-pytorch
Problem

Research questions and friction points this paper is trying to address.

Image Segmentation
Resource Efficiency
Hierarchical Detail Processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive Token Length Expansion
Mask2Former Architecture
Efficient Image Segmentation
πŸ”Ž Similar Papers
No similar papers found.
Abhishek Aich
Abhishek Aich
NEC Laboratories America
Computer VisionDeep Learning
Yumin Suh
Yumin Suh
Atmanity
computer visionmachine learning
S
S. Schulter
NEC Laboratories, America, USA
M
M. Chandraker
NEC Laboratories, America, USA, University of California, San Diego, USA