Semantic-Topological Graph Reasoning for Language-Guided Pulmonary Screening

πŸ“… 2026-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses key challenges in clinical free-text–guided segmentation of pulmonary medical images, including semantic ambiguity, anatomical structure overlap, and overfitting of large models under limited training data. To tackle these issues, the authors propose a novel framework that integrates a large language model (LLaMA-3-V) with a vision foundation model (MedSAM). The approach leverages text-to-vision intent distillation to extract diagnostic guidance and formulates lesion mask selection as a dynamic semantic-topological graph reasoning problem. A selective asymmetric fine-tuning strategy is introduced, updating fewer than 1% of model parameters. Evaluated on the LIDC-IDRI dataset, the method achieves a Dice coefficient of 81.5%, outperforming state-of-the-art approaches such as LISA by over 5%, while exhibiting high stability with a five-fold cross-validation variance of only 0.6%.
πŸ“ Abstract
Medical image segmentation driven by free-text clinical instructions is a critical frontier in computer-aided diagnosis. However, existing multimodal and foundation models struggle with the semantic ambiguity of clinical reports and fail to disambiguate complex anatomical overlaps in low-contrast scans. Furthermore, fully fine-tuning these massive architectures on limited medical datasets invariably leads to severe overfitting. To address these challenges, we propose a novel Semantic-Topological Graph Reasoning (STGR) framework for language-guided pulmonary screening. Our approach elegantly synergizes the reasoning capabilities of large language models (LLaMA-3-V) with the zero-shot delineation of vision foundation models (MedSAM). Specifically, we introduce a Text-to-Vision Intent Distillation (TVID) module to extract precise diagnostic guidance. To resolve anatomical ambiguity, we formulate mask selection as a dynamic graph reasoning problem, where candidate lesions are modeled as nodes and edges capture spatial and semantic affinities. To ensure deployment feasibility, we introduce a Selective Asymmetric Fine-Tuning (SAFT) strategy that updates less than 1% of the parameters. Rigorous 5-fold cross-validation on the LIDC-IDRI and LNDb datasets demonstrates that our framework establishes a new state-of-the-art. Notably, it achieves an 81.5% Dice Similarity Coefficient (DSC) on LIDC-IDRI, outperforming leading LLM-based tools like LISA by over 5%. Crucially, our SAFT strategy acts as a powerful regularizer, yielding exceptional cross-fold stability (0.6% DSC variance) and paving the way for robust, context-aware clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

semantic ambiguity
anatomical overlap
medical image segmentation
overfitting
language-guided screening
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-Topological Graph Reasoning
Text-to-Vision Intent Distillation
Selective Asymmetric Fine-Tuning
Language-Guided Segmentation
Zero-Shot Medical Vision
πŸ”Ž Similar Papers
No similar papers found.