BEVANet: Bilateral Efficient Visual Attention Network for Real-Time Semantic Segmentation

πŸ“… 2025-08-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Real-time semantic segmentation demands both large-receptive-field modeling and precise boundary delineation, yet Vision Transformers incur prohibitive computational overhead. To address this, we propose a dual-branch efficient visual attention network. First, we design Sparse Decomposed Large Separable Kernel Attention (SDLSKA) coupled with Comprehensive Kernel Selection (CKS) to expand the receptive field at low computational cost. Second, we introduce Deep Large-Kernel Pyramid Pooling Module (DLKPPM) to enhance multi-scale contextual representation. Third, we develop a Boundary-Guided Adaptive Fusion Module (BGAF) to refine contour accuracy. Evaluated on Cityscapes, our model achieves 79.3% mIoU without pretraining and 81.0% mIoU with ImageNet pretraining, while operating at 33 FPS on a single GPUβ€”setting a new state-of-the-art for real-time semantic segmentation.

Technology Category

Application Category

πŸ“ Abstract
Real-time semantic segmentation presents the dual challenge of designing efficient architectures that capture large receptive fields for semantic understanding while also refining detailed contours. Vision transformers model long-range dependencies effectively but incur high computational cost. To address these challenges, we introduce the Large Kernel Attention (LKA) mechanism. Our proposed Bilateral Efficient Visual Attention Network (BEVANet) expands the receptive field to capture contextual information and extracts visual and structural features using Sparse Decomposed Large Separable Kernel Attentions (SDLSKA). The Comprehensive Kernel Selection (CKS) mechanism dynamically adapts the receptive field to further enhance performance. Furthermore, the Deep Large Kernel Pyramid Pooling Module (DLKPPM) enriches contextual features by synergistically combining dilated convolutions and large kernel attention. The bilateral architecture facilitates frequent branch communication, and the Boundary Guided Adaptive Fusion (BGAF) module enhances boundary delineation by integrating spatial and semantic features under boundary guidance. BEVANet achieves real-time segmentation at 33 FPS, yielding 79.3% mIoU without pretraining and 81.0% mIoU on Cityscapes after ImageNet pretraining, demonstrating state-of-the-art performance. The code and model is available at https://github.com/maomao0819/BEVANet.
Problem

Research questions and friction points this paper is trying to address.

Real-time semantic segmentation with efficient large receptive fields
Reducing computational cost while maintaining segmentation accuracy
Enhancing boundary delineation through adaptive feature fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Kernel Attention mechanism
Comprehensive Kernel Selection adaptation
Boundary Guided Adaptive Fusion
P
Ping-Mao Huang
Graduate Institute of Networking and Multimedia, National Taiwan University
I
I-Tien Chao
Graduate Institute of Networking and Multimedia, National Taiwan University
P
Ping-Chia Huang
Department of Computer Science and Information Engineering, National Taiwan University
Jia-Wei Liao
Jia-Wei Liao
National Taiwan University
Generative ModelingComputer VisionInference-time OptimizationProtective AIMedical AI
Yung-Yu Chuang
Yung-Yu Chuang
National Taiwan University
computer graphicscomputer visionmultimedia