Alleviating Textual Reliance in Medical Language-guided Segmentation via Prototype-driven Semantic Approximation

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Medical language-guided segmentation is hindered by its reliance on paired image–report data, rendering vast quantities of unreported images unusable and impeding real-time clinical deployment prior to report generation. To address this, we propose ProLearn—the first prototype-based learning framework for medical segmentation—that constructs a discrete, compact prototype space and introduces a Prototype-driven Semantic Approximation (PSA) module to approximate image-to-text semantic mapping without textual input. ProLearn integrates semantic distillation with a query-response mechanism to generate robust, text-free semantic guidance signals. Extensive experiments on QaTa-COV19, MosMedData+, and Kvasir-SEG demonstrate that ProLearn significantly outperforms existing state-of-the-art methods under text-scarce conditions. Crucially, it decouples segmentation from report dependency, thereby enhancing clinical applicability in real-time, pre-reporting scenarios.

Technology Category

Application Category

📝 Abstract

Medical language-guided segmentation, integrating textual clinical reports as auxiliary guidance to enhance image segmentation, has demonstrated significant improvements over unimodal approaches. However, its inherent reliance on paired image-text input, which we refer to as ``textual reliance", presents two fundamental limitations: 1) many medical segmentation datasets lack paired reports, leaving a substantial portion of image-only data underutilized for training; and 2) inference is limited to retrospective analysis of cases with paired reports, limiting its applicability in most clinical scenarios where segmentation typically precedes reporting. To address these limitations, we propose ProLearn, the first Prototype-driven Learning framework for language-guided segmentation that fundamentally alleviates textual reliance. At its core, in ProLearn, we introduce a novel Prototype-driven Semantic Approximation (PSA) module to enable approximation of semantic guidance from textual input. PSA initializes a discrete and compact prototype space by distilling segmentation-relevant semantics from textual reports. Once initialized, it supports a query-and-respond mechanism which approximates semantic guidance for images without textual input, thereby alleviating textual reliance. Extensive experiments on QaTa-COV19, MosMedData+ and Kvasir-SEG demonstrate that ProLearn outperforms state-of-the-art language-guided methods when limited text is available.

Problem

Research questions and friction points this paper is trying to address.

Reduces reliance on paired image-text data

Enables segmentation without textual input

Improves performance with limited text availability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Prototype-driven Learning framework for segmentation

Prototype-driven Semantic Approximation module

Query-and-respond mechanism for text-free images

🔎 Similar Papers

No similar papers found.