๐ค AI Summary
To address ill-defined tumor boundaries and insufficient multimodal information integration in brain tumor segmentation, this paper proposes a spatial-language-vision fusion framework. It introduces a multimodal semantic fusion adapter to hierarchically decouple 3D MRI volumes and structured clinical text descriptions, and incorporates a bidirectional interactive visionโlanguage attention mechanism to enable iterative cross-modal feature co-optimization. Evaluated on the BraTS 2020 dataset, the model achieves a mean Dice score of 0.8505 and a 95% Hausdorff distance of 2.8256 mm, significantly outperforming state-of-the-art methods including SCAU-Net, CA-Net, and 3D U-Net. The core contribution lies in the first integration of structured clinical text into a 3D medical image segmentation pipeline, coupled with bidirectional attention to enable language-guided, boundary-precise tumor delineation.
๐ Abstract
This study aims to develop a novel multi-modal fusion framework for brain tumor segmentation that integrates spatial-language-vision information through bidirectional interactive attention mechanisms to improve segmentation accuracy and boundary delineation. Methods: We propose two core components: Multi-modal Semantic Fusion Adapter (MSFA) integrating 3D MRI data with clinical text descriptions through hierarchical semantic decoupling, and Bidirectional Interactive Visual-semantic Attention (BIVA) enabling iterative information exchange between modalities. The framework was evaluated on BraTS 2020 dataset comprising 369 multi-institutional MRI scans. Results: The proposed method achieved average Dice coefficient of 0.8505 and 95% Hausdorff distance of 2.8256mm across enhancing tumor, tumor core, and whole tumor regions, outperforming state-of-the-art methods including SCAU-Net, CA-Net, and 3D U-Net. Ablation studies confirmed critical contributions of semantic and spatial modules to boundary precision. Conclusion: Multi-modal semantic fusion combined with bidirectional interactive attention significantly enhances brain tumor segmentation performance, establishing new paradigms for integrating clinical knowledge into medical image analysis.