MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention

📅 2026-03-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing crack segmentation methods struggle to simultaneously model local textures, global dependencies, and sequential context, resulting in inadequate representation of complex structures. This work proposes MixerCSeg, a novel architecture that unifies CNN-based local pathways, Transformer-based global pathways, and Mamba-based sequential pathways within a single encoder to achieve multidimensional feature disentanglement. The approach introduces a TransMixer mechanism to decouple Mamba’s attention behavior, complemented by direction-guided edge-gated convolution (DEGConv), spatial block processing, and a spatial refinement multi-level fusion (SRF) module. Remarkably, MixerCSeg achieves state-of-the-art performance across multiple crack segmentation benchmarks with only 2.05 GFLOPs and 2.54M parameters, demonstrating an exceptional balance between computational efficiency and representational power.

Technology Category

Application Category

📝 Abstract
Feature encoders play a key role in pixel-level crack segmentation by shaping the representation of fine textures and thin structures. Existing CNN-, Transformer-, and Mamba-based models each capture only part of the required spatial or structural information, leaving clear gaps in modeling complex crack patterns. To address this, we present MixerCSeg, a mixer architecture designed like a coordinated team of specialists, where CNN-like pathways focus on local textures, Transformer-style paths capture global dependencies, and Mamba-inspired flows model sequential context within a single encoder. At the core of MixerCSeg is the TransMixer, which explores Mamba's latent attention behavior while establishing dedicated pathways that naturally express both locality and global awareness. To further enhance structural fidelity, we introduce a spatial block processing strategy and a Direction-guided Edge Gated Convolution (DEGConv) that strengthens edge sensitivity under irregular crack geometries with minimal computational overhead. A Spatial Refinement Multi-Level Fusion (SRF) module is then employed to refine multi-scale details without increasing complexity. Extensive experiments on multiple crack segmentation benchmarks show that MixerCSeg achieves state-of-the-art performance with only 2.05 GFLOPs and 2.54 M parameters, demonstrating both efficiency and strong representational capability. The code is available at https://github.com/spiderforest/MixerCSeg.
Problem

Research questions and friction points this paper is trying to address.

crack segmentation
feature encoder
structural representation
complex crack patterns
pixel-level segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixer Architecture
Decoupled Mamba Attention
Crack Segmentation
Edge-Guided Convolution
Efficient Multi-Path Encoder
🔎 Similar Papers
No similar papers found.
Zilong Zhao
Zilong Zhao
National University of Singapore & Betterdata
generative AImachine learningdistributed systemcontrol theory
Zhengming Ding
Zhengming Ding
Assistant Professor of Computer Science, Tulane University
Machine LearningComputer Vision
P
Pei Niu
School of Qilu Transportation, Shandong University, China
W
Wenhao Sun
School of Qilu Transportation, Shandong University, China
F
Feng Guo
School of Qilu Transportation, Shandong University, China