MadCLIP: Few-shot Medical Anomaly Detection with CLIP

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses few-shot anomaly detection in medical imaging, tackling both image-level anomaly classification (AC) and pixel-level anomaly segmentation (AS), without relying on synthetic data or external memory banks. We propose a CLIP-based dual-branch adaptation architecture: the visual encoder incorporates learnable adapters and textual prompts to enhance vision–language alignment; we introduce SigLIP loss—novelly applied to medical imaging—to model unpaired image–text relationships; and the dual-branch design explicitly decouples normal and abnormal feature representations. Our method achieves state-of-the-art performance across both cross-dataset and within-dataset benchmarks on multimodal medical data. Ablation studies validate the efficacy of each component. Key contributions include: (i) the first CLIP-based few-shot anomaly detection framework for medical imaging that requires neither synthetic data nor external memory; (ii) the first application of SigLIP loss in medical anomaly detection; and (iii) a dual-branch disentangled architecture that improves few-shot generalization.

Technology Category

Application Category

📝 Abstract

An innovative few-shot anomaly detection approach is presented, leveraging the pre-trained CLIP model for medical data, and adapting it for both image-level anomaly classification (AC) and pixel-level anomaly segmentation (AS). A dual-branch design is proposed to separately capture normal and abnormal features through learnable adapters in the CLIP vision encoder. To improve semantic alignment, learnable text prompts are employed to link visual features. Furthermore, SigLIP loss is applied to effectively handle the many-to-one relationship between images and unpaired text prompts, showcasing its adaptation in the medical field for the first time. Our approach is validated on multiple modalities, demonstrating superior performance over existing methods for AC and AS, in both same-dataset and cross-dataset evaluations. Unlike prior work, it does not rely on synthetic data or memory banks, and an ablation study confirms the contribution of each component. The code is available at https://github.com/mahshid1998/MadCLIP.

Problem

Research questions and friction points this paper is trying to address.

Few-shot anomaly detection in medical images using CLIP

Dual-branch design for normal and abnormal feature capture

Improving semantic alignment with learnable text prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages pre-trained CLIP for medical anomaly detection

Uses dual-branch design with learnable adapters

Employs learnable text prompts for semantic alignment

🔎 Similar Papers

Multi-modal vision-language model for generalizable annotation-free pathology localization and clinical diagnosis