TubeMLLM: A Foundation Model for Topology Knowledge Exploration in Vessel-like Anatomy

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenges of modeling tubular anatomical structures in medical imaging, where complex topology and data distribution shifts often lead to fragmented or spurious connections. To this end, we propose the first unified foundation architecture based on multimodal large language models. By injecting topological priors through natural language prompts, aligning visual and linguistic representations via a shared attention mechanism, and employing an adaptive loss weighting strategy, our approach enables topology-aware controllable generation. Our contributions include TubeMData—the first topology-centric multimodal benchmark—along with a zero-shot cross-modal transfer method and the adoption of topological metrics such as β₀ error for evaluation. The model achieves state-of-the-art performance across 15 datasets, reducing β₀ error in retinal images from 37.42 to 8.58, attaining a zero-shot Dice score of 67.50% (β₀ error: 1.21) on X-ray angiography, and achieving 97.38% accuracy in topological quality assessment.

Technology Category

Application Category

📝 Abstract

Modeling medical vessel-like anatomy is challenging due to its intricate topology and sensitivity to dataset shifts. Consequently, task-specific models often suffer from topological inconsistencies, including artificial disconnections and spurious merges. Motivated by the promise of multimodal large language models (MLLMs) for zero-shot generalization, we propose TubeMLLM, a unified foundation model that couples structured understanding with controllable generation for medical vessel-like anatomy. By integrating topological priors through explicit natural language prompting and aligning them with visual representations in a shared-attention architecture, TubeMLLM significantly enhances topology-aware perception. Furthermore, we construct TubeMData, a pionner multimodal benchmark comprising comprehensive topology-centric tasks, and introduce an adaptive loss weighting strategy to emphasize topology-critical regions during training. Extensive experiments on fifteen diverse datasets demonstrate our superiority. Quantitatively, TubeMLLM achieves state-of-the-art out-of-distribution performance, substantially reducing global topological discrepancies on color fundus photography (decreasing the $β_{0}$ number error from 37.42 to 8.58 compared to baselines). Notably, TubeMLLM exhibits exceptional zero-shot cross-modality transferring ability on unseen X-ray angiography, achieving a Dice score of 67.50% while significantly reducing the $β_{0}$ error to 1.21. TubeMLLM also maintains robustness against degradations such as blur, noise, and low resolution. Furthermore, in topology-aware understanding tasks, the model achieves 97.38% accuracy in evaluating mask topological quality, significantly outperforming standard vision-language baselines.

Problem

Research questions and friction points this paper is trying to address.

topology

vessel-like anatomy

topological inconsistency

medical imaging

dataset shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

topology-aware modeling

multimodal foundation model

zero-shot cross-modality transfer