TextBraTS: Text-Guided Volumetric Brain Tumor Segmentation with Innovative Dataset Development and Fusion Module Exploration

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of large-scale, volume-level paired multimodal datasets comprising MRI volumes and corresponding clinical text annotations—limiting vision-language co-modeling in brain tumor analysis—this work introduces TextBraTS, the first publicly available, large-scale, volume-aligned multimodal brain tumor dataset. We further propose a text-guided 3D segmentation framework featuring a novel serialized cross-modal cross-attention mechanism that enables fine-grained alignment between BERT-encoded clinical text and hierarchical 3D U-Net features. Systematic evaluation of templated prompting strategies and multiple fusion schemes demonstrates consistent performance gains, yielding significant Dice score improvements of +2.1%–3.7% for whole tumor, tumor core, and enhancing tumor segmentation. The dataset, source code, and pre-trained models are fully open-sourced, establishing a foundational infrastructure and methodological paradigm for medical multimodal research.

Technology Category

Application Category

📝 Abstract
Deep learning has demonstrated remarkable success in medical image segmentation and computer-aided diagnosis. In particular, numerous advanced methods have achieved state-of-the-art performance in brain tumor segmentation from MRI scans. While recent studies in other medical imaging domains have revealed that integrating textual reports with visual data can enhance segmentation accuracy, the field of brain tumor analysis lacks a comprehensive dataset that combines radiological images with corresponding textual annotations. This limitation has hindered the exploration of multimodal approaches that leverage both imaging and textual data. To bridge this critical gap, we introduce the TextBraTS dataset, the first publicly available volume-level multimodal dataset that contains paired MRI volumes and rich textual annotations, derived from the widely adopted BraTS2020 benchmark. Building upon this novel dataset, we propose a novel baseline framework and sequential cross-attention method for text-guided volumetric medical image segmentation. Through extensive experiments with various text-image fusion strategies and templated text formulations, our approach demonstrates significant improvements in brain tumor segmentation accuracy, offering valuable insights into effective multimodal integration techniques. Our dataset, implementation code, and pre-trained models are publicly available at https://github.com/Jupitern52/TextBraTS.
Problem

Research questions and friction points this paper is trying to address.

Lack of multimodal dataset for brain tumor segmentation
Need for text-guided volumetric image segmentation methods
Improving accuracy by fusing MRI scans with textual data
Innovation

Methods, ideas, or system contributions that make the work stand out.

TextBraTS dataset with paired MRI and text
Novel baseline framework for segmentation
Sequential cross-attention for text-image fusion
X
Xiaoyu Shi
Ritsumeikan University, Osaka, Japan
R
Rahul Kumar Jain
Ritsumeikan University, Osaka, Japan
Y
Yinhao Li
Ritsumeikan University, Osaka, Japan
Ruibo Hou
Ruibo Hou
UIUC
NLPAI4SCIENCE
J
Jingliang Cheng
Department of Magnetic Resonance Imaging, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
J
Jie Bai
Department of Magnetic Resonance Imaging, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
G
Guohua Zhao
Department of Magnetic Resonance Imaging, the First Affiliated Hospital of Zhengzhou University, Zhengzhou, China
L
Lanfen Lin
Zhejiang University, Hangzhou, China
R
Rui Xu
Dalian University of Technology, Dalian, China
Y
Yen-wei Chen
Ritsumeikan University, Osaka, Japan