TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of modeling tubular anatomical structures in medical imaging, where complex topology and data distribution shifts often lead to fragmented or spurious connections. To this end, we propose the first unified foundation architecture based on multimodal large language models. By injecting topological priors through natural language prompts, aligning visual and linguistic representations via a shared attention mechanism, and employing an adaptive loss weighting strategy, our approach enables topology-aware controllable generation. Our contributions include TubeMData—the first topology-centric multimodal benchmark—along with a zero-shot cross-modal transfer method and the adoption of topological metrics such as β₀ error for evaluation. The model achieves state-of-the-art performance across 15 datasets, reducing β₀ error in retinal images from 37.42 to 8.58, attaining a zero-shot Dice score of 67.50% (β₀ error: 1.21) on X-ray angiography, and achieving 97.38% accuracy in topological quality assessment.

Technology Category

Application Category

📝 Abstract
Modeling medical vessel-like anatomy is challenging due to its intricate topology and sensitivity to dataset shifts. Consequently, task-specific models often suffer from topological inconsistencies, including artificial disconnections and spurious merges. Motivated by the promise of multimodal large language models (MLLMs) for zero-shot generalization, we propose TubeMLLM, a unified foundation model that couples structured understanding with controllable generation for medical vessel-like anatomy. By integrating topological priors through explicit natural language prompting and aligning them with visual representations in a shared-attention architecture, TubeMLLM significantly enhances topology-aware perception. Furthermore, we construct TubeMData, a pionner multimodal benchmark comprising comprehensive topology-centric tasks, and introduce an adaptive loss weighting strategy to emphasize topology-critical regions during training. Extensive experiments on fifteen diverse datasets demonstrate our superiority. Quantitatively, TubeMLLM achieves state-of-the-art out-of-distribution performance, substantially reducing global topological discrepancies on color fundus photography (decreasing the $β_{0}$ number error from 37.42 to 8.58 compared to baselines). Notably, TubeMLLM exhibits exceptional zero-shot cross-modality transferring ability on unseen X-ray angiography, achieving a Dice score of 67.50% while significantly reducing the $β_{0}$ error to 1.21. TubeMLLM also maintains robustness against degradations such as blur, noise, and low resolution. Furthermore, in topology-aware understanding tasks, the model achieves 97.38% accuracy in evaluating mask topological quality, significantly outperforming standard vision-language baselines.
Problem

Research questions and friction points this paper is trying to address.

topology
vessel-like anatomy
topological inconsistency
medical imaging
dataset shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

topology-aware modeling
multimodal foundation model
zero-shot cross-modality transfer
structured medical understanding
adaptive loss weighting
🔎 Similar Papers
No similar papers found.
Y
Yaoyu Liu
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China
M
Minghui Zhang
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China; Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Xin You
Xin You
Beihang University
Performance Tool、HPC
Hanxiao Zhang
Hanxiao Zhang
Nanjing University
Yun Gu
Yun Gu
Shanghai Jiao Tong University
Medical Image AnalysisComputer-Assisted Intervention