LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Low-dose CT (LDCT) images suffer from severe noise and poor quality, compromising diagnostic accuracy; existing deep learning denoising methods predominantly focus on pixel-level reconstruction without semantic guidance. To address this, we propose LangMamba—the first language-supervised LDCT denoising framework. It leverages a frozen vision-language model to extract anatomical semantic priors, constructs a semantic-enhanced autoencoder, and introduces a dual-space alignment loss for joint optimization in both image and semantic domains. By integrating Mamba’s efficient sequence modeling capability with linguistically grounded, interpretable constraints, LangMamba achieves superior performance over state-of-the-art methods across multiple public benchmarks. It yields more faithful anatomical detail recovery, higher visual fidelity, stronger generalization to unseen scanners and protocols, and plug-and-play deployability without architectural modification.

Technology Category

Application Category

📝 Abstract

Low-dose computed tomography (LDCT) reduces radiation exposure but often degrades image quality, potentially compromising diagnostic accuracy. Existing deep learning-based denoising methods focus primarily on pixel-level mappings, overlooking the potential benefits of high-level semantic guidance. Recent advances in vision-language models (VLMs) suggest that language can serve as a powerful tool for capturing structured semantic information, offering new opportunities to improve LDCT reconstruction. In this paper, we introduce LangMamba, a Language-driven Mamba framework for LDCT denoising that leverages VLM-derived representations to enhance supervision from normal-dose CT (NDCT). LangMamba follows a two-stage learning strategy. First, we pre-train a Language-guided AutoEncoder (LangAE) that leverages frozen VLMs to map NDCT images into a semantic space enriched with anatomical information. Second, we synergize LangAE with two key components to guide LDCT denoising: Semantic-Enhanced Efficient Denoiser (SEED), which enhances NDCT-relevant local semantic while capturing global features with efficient Mamba mechanism, and Language-engaged Dual-space Alignment (LangDA) Loss, which ensures that denoised images align with NDCT in both perceptual and semantic spaces. Extensive experiments on two public datasets demonstrate that LangMamba outperforms conventional state-of-the-art methods, significantly improving detail preservation and visual fidelity. Remarkably, LangAE exhibits strong generalizability to unseen datasets, thereby reducing training costs. Furthermore, LangDA loss improves explainability by integrating language-guided insights into image reconstruction and offers a plug-and-play fashion. Our findings shed new light on the potential of language as a supervisory signal to advance LDCT denoising. The code is publicly available on https://github.com/hao1635/LangMamba.

Problem

Research questions and friction points this paper is trying to address.

Improves LDCT image quality using vision-language models

Enhances semantic guidance for better CT reconstruction

Reduces radiation exposure while preserving diagnostic accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages VLM-derived semantic representations for CT denoising

Uses Language-guided AutoEncoder enriched with anatomical information

Integrates Semantic-Enhanced Denoiser and Language-aligned Loss

🔎 Similar Papers

RadCLIP: Enhancing Radiologic Image Analysis through Contrastive Language-Image Pre-training