Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Leveraging the semantic reasoning capabilities of large language models (LLMs) for medical image segmentation without incurring substantial trainable parameter overhead remains challenging. Method: We propose LLM4Seg—a novel framework that integrates frozen, pre-trained LLMs (e.g., LLaMA, DeepSeek) into a CNN encoder-decoder architecture, enabling direct processing of visual tokens and endowing the model with strong semantic awareness. Semantic enhancement of visual tokens is achieved via lightweight fine-tuning, introducing only minimal trainable parameters. Contribution/Results: Evaluated across multimodal medical imaging modalities—including ultrasound, dermoscopy, colonoscopy, and CT—LLM4Seg consistently improves segmentation performance, enhancing both global contextual modeling and local detail fidelity. Crucially, it transfers LLMs’ semantic understanding to purely visual segmentation tasks *without* requiring vision-language alignment pretraining. This establishes a new low-parameter paradigm for multimodal medical image segmentation.

Technology Category

Application Category

📝 Abstract

With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.

Problem

Research questions and friction points this paper is trying to address.

Enhancing medical image segmentation using pre-trained LLM

Transferring LLM semantic awareness to improve segmentation tasks

Robust performance across diverse medical imaging modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen pre-trained LLM processes visual tokens

Hybrid CNN-LLM framework boosts segmentation

Transfers LLM semantics to enhance segmentation

🔎 Similar Papers

No similar papers found.