Pre-Trained LLM is a Semantic-Aware and Generalizable Segmentation Booster

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Leveraging the semantic reasoning capabilities of large language models (LLMs) for medical image segmentation without incurring substantial trainable parameter overhead remains challenging. Method: We propose LLM4Seg—a novel framework that integrates frozen, pre-trained LLMs (e.g., LLaMA, DeepSeek) into a CNN encoder-decoder architecture, enabling direct processing of visual tokens and endowing the model with strong semantic awareness. Semantic enhancement of visual tokens is achieved via lightweight fine-tuning, introducing only minimal trainable parameters. Contribution/Results: Evaluated across multimodal medical imaging modalities—including ultrasound, dermoscopy, colonoscopy, and CT—LLM4Seg consistently improves segmentation performance, enhancing both global contextual modeling and local detail fidelity. Crucially, it transfers LLMs’ semantic understanding to purely visual segmentation tasks *without* requiring vision-language alignment pretraining. This establishes a new low-parameter paradigm for multimodal medical image segmentation.

Technology Category

Application Category

📝 Abstract
With the advancement of Large Language Model (LLM) for natural language processing, this paper presents an intriguing finding: a frozen pre-trained LLM layer can process visual tokens for medical image segmentation tasks. Specifically, we propose a simple hybrid structure that integrates a pre-trained, frozen LLM layer within the CNN encoder-decoder segmentation framework (LLM4Seg). Surprisingly, this design improves segmentation performance with a minimal increase in trainable parameters across various modalities, including ultrasound, dermoscopy, polypscopy, and CT scans. Our in-depth analysis reveals the potential of transferring LLM's semantic awareness to enhance segmentation tasks, offering both improved global understanding and better local modeling capabilities. The improvement proves robust across different LLMs, validated using LLaMA and DeepSeek.
Problem

Research questions and friction points this paper is trying to address.

Enhancing medical image segmentation using pre-trained LLM
Transferring LLM semantic awareness to improve segmentation tasks
Robust performance across diverse medical imaging modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frozen pre-trained LLM processes visual tokens
Hybrid CNN-LLM framework boosts segmentation
Transfers LLM semantics to enhance segmentation
🔎 Similar Papers
No similar papers found.
Fenghe Tang
Fenghe Tang
University of Science and Technology of China
Medical Image AnalysisFoundation model
Wenxin Ma
Wenxin Ma
University of Science and Technology of China
AIcomputer vision
Zhiyang He
Zhiyang He
Massachusetts Institute of Technology
Quantum Information
X
Xiaodong Tao
Anhui IFLYTEK CO., Ltd.
Zihang Jiang
Zihang Jiang
School of Biomedical Engineering, USTC, Suzhou Institute for Advanced Research
Computer VisionMedical Imaging3D
S
S. Kevin Zhou
School of Biomedical Engineering, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China; Center for Medical Imaging, Robotics, Analytic Computing & Learning (MIRACLE), Suzhou Institute for Advance Research, USTC, 215123, P.R. China