A Multi-Modal Fusion Framework for Brain Tumor Segmentation Based on 3D Spatial-Language-Vision Integration and Bidirectional Interactive Attention Mechanism

๐Ÿ“… 2025-07-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address ill-defined tumor boundaries and insufficient multimodal information integration in brain tumor segmentation, this paper proposes a spatial-language-vision fusion framework. It introduces a multimodal semantic fusion adapter to hierarchically decouple 3D MRI volumes and structured clinical text descriptions, and incorporates a bidirectional interactive visionโ€“language attention mechanism to enable iterative cross-modal feature co-optimization. Evaluated on the BraTS 2020 dataset, the model achieves a mean Dice score of 0.8505 and a 95% Hausdorff distance of 2.8256 mm, significantly outperforming state-of-the-art methods including SCAU-Net, CA-Net, and 3D U-Net. The core contribution lies in the first integration of structured clinical text into a 3D medical image segmentation pipeline, coupled with bidirectional attention to enable language-guided, boundary-precise tumor delineation.

Technology Category

Application Category

๐Ÿ“ Abstract
This study aims to develop a novel multi-modal fusion framework for brain tumor segmentation that integrates spatial-language-vision information through bidirectional interactive attention mechanisms to improve segmentation accuracy and boundary delineation. Methods: We propose two core components: Multi-modal Semantic Fusion Adapter (MSFA) integrating 3D MRI data with clinical text descriptions through hierarchical semantic decoupling, and Bidirectional Interactive Visual-semantic Attention (BIVA) enabling iterative information exchange between modalities. The framework was evaluated on BraTS 2020 dataset comprising 369 multi-institutional MRI scans. Results: The proposed method achieved average Dice coefficient of 0.8505 and 95% Hausdorff distance of 2.8256mm across enhancing tumor, tumor core, and whole tumor regions, outperforming state-of-the-art methods including SCAU-Net, CA-Net, and 3D U-Net. Ablation studies confirmed critical contributions of semantic and spatial modules to boundary precision. Conclusion: Multi-modal semantic fusion combined with bidirectional interactive attention significantly enhances brain tumor segmentation performance, establishing new paradigms for integrating clinical knowledge into medical image analysis.
Problem

Research questions and friction points this paper is trying to address.

Develop multi-modal fusion for brain tumor segmentation
Integrate spatial-language-vision via bidirectional attention
Improve segmentation accuracy and boundary delineation
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D spatial-language-vision fusion framework
Bidirectional interactive attention mechanism
Multi-modal semantic fusion adapter
๐Ÿ”Ž Similar Papers
No similar papers found.