TAMISeg: Text-Aligned Multi-scale Medical Image Segmentation with Semantic Encoder Distillation

📅 2026-04-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This work addresses the challenges of medical image segmentation—namely, the scarcity of fine-grained annotations, anatomical complexity, and image degradation—by introducing TAMISeg, a novel framework that integrates clinical text prompts with semantic distillation from a frozen DINOv3 vision teacher model. TAMISeg features a consistency-aware encoder, a semantic distillation module, and a scale-adaptive decoder, collectively reducing reliance on pixel-level annotations while enhancing semantic discriminability and robustness. Through multimodal text–image alignment and strong perturbation-based pretraining, the proposed method achieves state-of-the-art performance across multiple benchmarks, significantly outperforming existing unimodal and multimodal approaches on the Kvasir-SEG, MosMedData+, and QaTa-COV19 datasets.

Technology Category

Application Category

📝 Abstract
Medical image segmentation remains challenging due to limited fine-grained annotations, complex anatomical structures, and image degradation from noise, low contrast, or illumination variation. We propose TAMISeg, a text-guided segmentation framework that incorporates clinical language prompts and semantic distillation as auxiliary semantic cues to enhance visual understanding and reduce reliance on pixel-level fine-grained annotations. TAMISeg integrates three core components: a consistency-aware encoder pretrained with strong perturbations for robust feature extraction, a semantic encoder distillation module with supervision from a frozen DINOv3 teacher to enhance semantic discriminability, and a scale-adaptive decoder that segments anatomical structures across different spatial scales. Experiments on the Kvasir-SEG, MosMedData+, and QaTa-COV19 datasets demonstrate that TAMISeg consistently outperforms existing uni-modal and multi-modal methods in both qualitative and quantitative evaluations. Code will be made publicly available at https://github.com/qczggaoqiang/TAMISeg.
Problem

Research questions and friction points this paper is trying to address.

medical image segmentation
limited annotations
anatomical complexity
image degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

text-guided segmentation
semantic distillation
multi-scale medical image segmentation
DINOv3
consistency-aware encoder