LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-dose CT (LDCT) images suffer from severe noise and poor quality, compromising diagnostic accuracy; existing deep learning denoising methods predominantly focus on pixel-level reconstruction without semantic guidance. To address this, we propose LangMamba—the first language-supervised LDCT denoising framework. It leverages a frozen vision-language model to extract anatomical semantic priors, constructs a semantic-enhanced autoencoder, and introduces a dual-space alignment loss for joint optimization in both image and semantic domains. By integrating Mamba’s efficient sequence modeling capability with linguistically grounded, interpretable constraints, LangMamba achieves superior performance over state-of-the-art methods across multiple public benchmarks. It yields more faithful anatomical detail recovery, higher visual fidelity, stronger generalization to unseen scanners and protocols, and plug-and-play deployability without architectural modification.

Technology Category

Application Category

📝 Abstract
Low-dose computed tomography (LDCT) reduces radiation exposure but often degrades image quality, potentially compromising diagnostic accuracy. Existing deep learning-based denoising methods focus primarily on pixel-level mappings, overlooking the potential benefits of high-level semantic guidance. Recent advances in vision-language models (VLMs) suggest that language can serve as a powerful tool for capturing structured semantic information, offering new opportunities to improve LDCT reconstruction. In this paper, we introduce LangMamba, a Language-driven Mamba framework for LDCT denoising that leverages VLM-derived representations to enhance supervision from normal-dose CT (NDCT). LangMamba follows a two-stage learning strategy. First, we pre-train a Language-guided AutoEncoder (LangAE) that leverages frozen VLMs to map NDCT images into a semantic space enriched with anatomical information. Second, we synergize LangAE with two key components to guide LDCT denoising: Semantic-Enhanced Efficient Denoiser (SEED), which enhances NDCT-relevant local semantic while capturing global features with efficient Mamba mechanism, and Language-engaged Dual-space Alignment (LangDA) Loss, which ensures that denoised images align with NDCT in both perceptual and semantic spaces. Extensive experiments on two public datasets demonstrate that LangMamba outperforms conventional state-of-the-art methods, significantly improving detail preservation and visual fidelity. Remarkably, LangAE exhibits strong generalizability to unseen datasets, thereby reducing training costs. Furthermore, LangDA loss improves explainability by integrating language-guided insights into image reconstruction and offers a plug-and-play fashion. Our findings shed new light on the potential of language as a supervisory signal to advance LDCT denoising. The code is publicly available on https://github.com/hao1635/LangMamba.
Problem

Research questions and friction points this paper is trying to address.

Improves LDCT image quality using vision-language models
Enhances semantic guidance for better CT reconstruction
Reduces radiation exposure while preserving diagnostic accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages VLM-derived semantic representations for CT denoising
Uses Language-guided AutoEncoder enriched with anatomical information
Integrates Semantic-Enhanced Denoiser and Language-aligned Loss
🔎 Similar Papers
No similar papers found.
Z
Zhihao Chen
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai 200433, China
T
Tao Chen
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai 200433, China
Chenhui Wang
Chenhui Wang
PhD Candidate, Fudan University
AI for NeuroscienceComputer Vision
Q
Qi Gao
Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai 200433, China
Huidong Xie
Huidong Xie
Yale University
Medical ImagingNuclear ImagingPETSPECTDeep Learning
Chuang Niu
Chuang Niu
Rensselaer Polytechnic Institute
Multimodal Medical AIFoundation ModelComputer VisionMedical Imaging
G
Ge Wang
Biomedical Imaging Center, Center for Biotechnology and Interdisciplinary Studies, Department of Biomedical Engineering, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
Hongming Shan
Hongming Shan
Fudan University; Rensselaer Polytechnic institute
Machine LearningMedical ImagingComputer Vision