TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration

📅 2025-06-20

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address the scarcity of large-scale, volume-level paired multimodal datasets comprising MRI volumes and corresponding clinical text annotations—limiting vision-language co-modeling in brain tumor analysis—this work introduces TextBraTS, the first publicly available, large-scale, volume-aligned multimodal brain tumor dataset. We further propose a text-guided 3D segmentation framework featuring a novel serialized cross-modal cross-attention mechanism that enables fine-grained alignment between BERT-encoded clinical text and hierarchical 3D U-Net features. Systematic evaluation of templated prompting strategies and multiple fusion schemes demonstrates consistent performance gains, yielding significant Dice score improvements of +2.1%–3.7% for whole tumor, tumor core, and enhancing tumor segmentation. The dataset, source code, and pre-trained models are fully open-sourced, establishing a foundational infrastructure and methodological paradigm for medical multimodal research.

Technology Category

Application Category

📝 Abstract

Deep learning has demonstrated remarkable success in medical image segmentation and computer-aided diagnosis. In particular, numerous advanced methods have achieved state-of-the-art performance in brain tumor segmentation from MRI scans. While recent studies in other medical imaging domains have revealed that integrating textual reports with visual data can enhance segmentation accuracy, the field of brain tumor analysis lacks a comprehensive dataset that combines radiological images with corresponding textual annotations. This limitation has hindered the exploration of multimodal approaches that leverage both imaging and textual data. To bridge this critical gap, we introduce the TextBraTS dataset, the first publicly available volume-level multimodal dataset that contains paired MRI volumes and rich textual annotations, derived from the widely adopted BraTS2020 benchmark. Building upon this novel dataset, we propose a novel baseline framework and sequential cross-attention method for text-guided volumetric medical image segmentation. Through extensive experiments with various text-image fusion strategies and templated text formulations, our approach demonstrates significant improvements in brain tumor segmentation accuracy, offering valuable insights into effective multimodal integration techniques. Our dataset, implementation code, and pre-trained models are publicly available at https://github.com/Jupitern52/TextBraTS.

Problem

Research questions and friction points this paper is trying to address.

Lack of multimodal dataset for brain tumor segmentation

Need for text-guided volumetric image segmentation methods

Improving accuracy by fusing MRI scans with textual data

Innovation

Methods, ideas, or system contributions that make the work stand out.

TextBraTS dataset with paired MRI and text

Novel baseline framework for segmentation

Sequential cross-attention for text-image fusion

🔎 Similar Papers

An Integrated Deep Learning Framework for Effective Brain Tumor Localization, Segmentation, and Classification from Magnetic Resonance Images