Efficient Semantic Segmentation via Lightweight Multiple-Information Interaction Network

📅 2024-10-03
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and memory footprint in real-time semantic segmentation, this paper proposes the Lightweight Multi-Information Interaction Network (LMIINet). Our method introduces three key innovations: (1) a novel Lightweight Feature Interaction Bottleneck (LFIB) module for efficient cross-layer feature fusion; (2) an enhanced Flatten Transformer architecture to strengthen joint local-global contextual modeling; and (3) a learnable coefficient mechanism for adaptive multi-scale feature weighting. Evaluated on Cityscapes and CamVid, LMIINet achieves 72.0% mIoU at 100 FPS and 69.94% mIoU at 160 FPS, respectively, with only 0.72M parameters and 11.74G FLOPs (on RTX 2080 Ti). These results significantly outperform existing lightweight approaches in both accuracy and efficiency, establishing a new trade-off frontier for real-time semantic segmentation.

Technology Category

Application Category

📝 Abstract
Recently, integrating the local modeling capabilities of Convolutional Neural Networks (CNNs) with the global dependency strengths of Transformers has created a sensation in the semantic segmentation community. However, substantial computational workloads and high hardware memory demands remain major obstacles to their further application in real-time scenarios. In this work, we propose a Lightweight Multiple-Information Interaction Network (LMIINet) for real-time semantic segmentation, which effectively combines CNNs and Transformers while reducing redundant computations and memory footprints. It features Lightweight Feature Interaction Bottleneck (LFIB) modules comprising efficient convolutions that enhance context integration. Additionally, improvements are made to the Flatten Transformer by enhancing local and global feature interaction to capture detailed semantic information. Incorporating a combination coefficient learning scheme in both LFIB and Transformer blocks facilitates improved feature interaction. Extensive experiments demonstrate that LMIINet excels in balancing accuracy and efficiency. With only 0.72M parameters and 11.74G FLOPs (Floating Point Operations Per Second), LMIINet achieves 72.0% mIoU at 100 FPS (Frames Per Second) on the Cityscapes test set and 69.94% mIoU (mean Intersection over Union) at 160 FPS on the CamVid test dataset using a single RTX2080Ti GPU.
Problem

Research questions and friction points this paper is trying to address.

Lightweight network for real-time semantic segmentation
Reducing computational workloads and memory demands
Combining CNNs and Transformers efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Multiple-Information Interaction Network
Lightweight Feature Interaction Bottleneck modules
Improved Flatten Transformer for feature interaction
🔎 Similar Papers
No similar papers found.
Y
Yangyang Qiu
Institute of Advanced Technology, Nanjing University of Posts and Telecommunications, Nanjing 210046, China, Key Laboratory of Artificial Intelligence, Ministry of Education, Shanghai 200240, and also with the Provincial Key Laboratory for Computer Information Processing Technology, Soochow University, Suzhou 215006, China
G
Guoan Xu
Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, Australia
Guangwei Gao
Guangwei Gao
Professor of PCALab@NJUST, IEEE/CCF/CSIG/CAAI/CAA Senior Member
Pattern RecognitionImage UnderstandingMachine Learning
Z
Zhenhua Guo
Tianyijiaotong Technology Ltd., Suzhou 215100, China
Y
Yi Yu
Graduate School of Advanced Science and Engineering, Hiroshima University, Hiroshima 739-8511, Japan
C
Chia-Wen Lin
Department of Electrical Engineering, National Tsing Hua University, Hsinchu, Taiwan 30013, R.O.C.