🤖 AI Summary
Existing semantic segmentation methods often suffer from insufficient decoder capacity, struggling to jointly model contextual information and preserve boundary details. To address this, we propose WaveSeg, the first framework that jointly refines features in both spatial and wavelet domains. WaveSeg explicitly incorporates high-frequency priors to enhance edge responsiveness; introduces a spectral decomposition attention module that synergistically integrates wavelet high-frequency components with Mamba’s long-range modeling capability; employs reparameterized convolutions to ensure low-frequency semantic integrity; and adopts residual-guided multi-scale fusion to improve feature consistency. Evaluated on multiple standard benchmarks, WaveSeg consistently outperforms state-of-the-art methods, achieving superior trade-offs among mean Intersection-over-Union (mIoU), boundary accuracy, and inference efficiency. These results empirically validate the effectiveness of spectrum-aware modeling for fine-grained semantic segmentation.
📝 Abstract
While recent semantic segmentation networks heavily rely on powerful pretrained encoders, most employ simplistic decoders, leading to suboptimal trade-offs between semantic context and fine-grained detail preservation. To address this, we propose a novel decoder architecture, WaveSeg, which jointly optimizes feature refinement in spatial and wavelet domains. Specifically, high-frequency components are first learned from input images as explicit priors to reinforce boundary details at early stages. A multi-scale fusion mechanism, Dual Domain Operation (DDO), is then applied, and the novel Spectrum Decomposition Attention (SDA) block is proposed, which is developed to leverage Mamba's linear-complexity long-range modeling to enhance high-frequency structural details. Meanwhile, reparameterized convolutions are applied to preserve low-frequency semantic integrity in the wavelet domain. Finally, a residual-guided fusion integrates multi-scale features with boundary-aware representations at native resolution, producing semantically and structurally rich feature maps. Extensive experiments on standard benchmarks demonstrate that WaveSeg, leveraging wavelet-domain frequency prior with Mamba-based attention, consistently outperforms state-of-the-art approaches both quantitatively and qualitatively, achieving efficient and precise segmentation.