WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address the high memory consumption and weak local detail modeling of Transformers in 3D medical image segmentation, this paper proposes WaveFormer—a wavelet-driven frequency-domain feature representation framework. Inspired by the top-down visual recognition mechanism in human vision, WaveFormer employs discrete wavelet transform (DWT) for multi-scale frequency-domain encoding, introduces a wavelet-basis feature aggregation and reconstruction module, and designs a lightweight 3D self-attention mechanism to jointly capture global context and fine-grained local structures. The architecture is biologically interpretable and significantly reduces both parameter count and GPU memory footprint. Evaluated on three major benchmarks—BraTS2023, FLARE2021, and KiTS2023—WaveFormer achieves state-of-the-art performance while substantially accelerating both training and inference. This work establishes a new paradigm for efficient and interpretable 3D medical image segmentation.

Technology Category

Application Category

📝 Abstract

Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limi- tations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for contextual rep- resentation, and ii) is inspired by the top-down mechanism of the human visual recognition system, making it a biologically motivated architec- ture. By employing discrete wavelet transformations (DWT) at multiple scales, WaveFormer preserves both global context and high-frequency de- tails while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction. This significantly reduces the number of parameters, which is critical for real-world deployment where compu- tational resources and training times are constrained. Furthermore, the model is generic and easily adaptable to diverse applications. Evaluations on BraTS2023, FLARE2021, and KiTS2023 demonstrate performance on par with state-of-the-art methods while offering substantially lower computational complexity.

Problem

Research questions and friction points this paper is trying to address.

Addresses high memory overhead in 3D transformer-based medical image segmentation

Improves fine-grained local feature capture in 3D medical image analysis

Reduces computational complexity for real-world deployment constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D transformer with wavelet-driven feature representation

Discrete wavelet transformations for multi-scale feature preservation

Biologically inspired top-down visual recognition mechanism

🔎 Similar Papers

No similar papers found.