MixerCSeg: An Efficient Mixer Architecture for Crack Segmentation via Decoupled Mamba Attention

📅 2026-03-01

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Existing crack segmentation methods struggle to simultaneously model local textures, global dependencies, and sequential context, resulting in inadequate representation of complex structures. This work proposes MixerCSeg, a novel architecture that unifies CNN-based local pathways, Transformer-based global pathways, and Mamba-based sequential pathways within a single encoder to achieve multidimensional feature disentanglement. The approach introduces a TransMixer mechanism to decouple Mamba’s attention behavior, complemented by direction-guided edge-gated convolution (DEGConv), spatial block processing, and a spatial refinement multi-level fusion (SRF) module. Remarkably, MixerCSeg achieves state-of-the-art performance across multiple crack segmentation benchmarks with only 2.05 GFLOPs and 2.54M parameters, demonstrating an exceptional balance between computational efficiency and representational power.

Technology Category

Application Category

📝 Abstract

Feature encoders play a key role in pixel-level crack segmentation by shaping the representation of fine textures and thin structures. Existing CNN-, Transformer-, and Mamba-based models each capture only part of the required spatial or structural information, leaving clear gaps in modeling complex crack patterns. To address this, we present MixerCSeg, a mixer architecture designed like a coordinated team of specialists, where CNN-like pathways focus on local textures, Transformer-style paths capture global dependencies, and Mamba-inspired flows model sequential context within a single encoder. At the core of MixerCSeg is the TransMixer, which explores Mamba's latent attention behavior while establishing dedicated pathways that naturally express both locality and global awareness. To further enhance structural fidelity, we introduce a spatial block processing strategy and a Direction-guided Edge Gated Convolution (DEGConv) that strengthens edge sensitivity under irregular crack geometries with minimal computational overhead. A Spatial Refinement Multi-Level Fusion (SRF) module is then employed to refine multi-scale details without increasing complexity. Extensive experiments on multiple crack segmentation benchmarks show that MixerCSeg achieves state-of-the-art performance with only 2.05 GFLOPs and 2.54 M parameters, demonstrating both efficiency and strong representational capability. The code is available at https://github.com/spiderforest/MixerCSeg.

Problem

Research questions and friction points this paper is trying to address.

crack segmentation

feature encoder

structural representation

complex crack patterns

pixel-level segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixer Architecture

Decoupled Mamba Attention

Crack Segmentation