An Efficient Semantic Segmentation Decoder for In-Car or Distributed Applications

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the joint computation-transmission optimization challenge for semantic segmentation in vehicle-cloud-edge collaborative systems under real-time constraints, this paper proposes the first lightweight joint feature-and-task decoding method tailored for the SegDeformer architecture. Unlike conventional CNN-based joint decoding, we design a low-complexity Transformer decoder that, sharing a common source encoder, is separately optimized for on-vehicle deployment (low latency) and distributed-edge/cloud deployment (high compression ratio). We innovatively introduce structured sparse attention and parameter-sharing mechanisms to substantially reduce computational overhead. Experiments demonstrate that our method achieves 11.7× faster on-vehicle inference on Cityscapes and 154.3 fps on ADE20K. On the cloud, it requires only 0.14% of the parameters of baseline models while attaining state-of-the-art mIoU across multiple bitrates—effectively balancing accuracy, efficiency, and deployment flexibility.

Technology Category

Application Category

📝 Abstract
Modern automotive systems leverage deep neural networks (DNNs) for semantic segmentation and operate in two key application areas: (1) In-car, where the DNN solely operates in the vehicle without strict constraints on the data rate. (2) Distributed, where one DNN part operates in the vehicle and the other part typically on a large-scale cloud platform with a particular constraint on transmission bitrate efficiency. Typically, both applications share an image and source encoder, while each uses distinct (joint) source and task decoders. Prior work utilized convolutional neural networks for joint source and task decoding but did not investigate transformer-based alternatives such as SegDeformer, which offer superior performance at the cost of higher computational complexity. In this work, we propose joint feature and task decoding for SegDeformer, thereby enabling lower computational complexity in both in-car and distributed applications, despite SegDeformer's computational demands. This improves scalability in the cloud while reducing in-car computational complexity. For the in-car application, we increased the frames per second (fps) by up to a factor of $11.7$ ($1.4$ fps to $16.5$ fps) on Cityscapes and by up to a factor of $3.5$ ($43.3$ fps to $154.3$ fps) on ADE20K, while being on-par w.r.t. the mean intersection over union (mIoU) of the transformer-based baseline that doesn't compress by a source codec. For the distributed application, we achieve state-of-the-art (SOTA) over a wide range of bitrates on the mIoU metric, while using only $0.14$% ($0.04$%) of cloud DNN parameters used in previous SOTA, reported on ADE20K (Cityscapes).
Problem

Research questions and friction points this paper is trying to address.

Enhancing semantic segmentation decoder efficiency for automotive systems
Reducing computational complexity in both in-car and distributed applications
Achieving higher frame rates with fewer cloud DNN parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based joint feature and task decoding
Reduced computational complexity for in-car systems
Minimal cloud parameters with SOTA bitrate efficiency
🔎 Similar Papers
No similar papers found.
D
Danish Nazir
Technische Universität Braunschweig, Braunschweig, Germany
G
Gowtham Sai Inti
Group Innovation, Volkswagen AG, Wolfsburg, Germany
T
Timo Bartels
Technische Universität Braunschweig, Braunschweig, Germany
J
Jan Piewek
Group Innovation, Volkswagen AG, Wolfsburg, Germany
T
Thorsten Bagdonat
Group Innovation, Volkswagen AG, Wolfsburg, Germany
Tim Fingscheidt
Tim Fingscheidt
Professor, IEEE Fellow, ITG Fellow, Technische Universität Braunschweig, Germany
Speech EnhancementAcoustic Signal ProcessingSpeech ProcessingEnvironment PerceptionNLP