SD-MAD: Sign-Driven Few-shot Multi-Anomaly Detection in Medical Images

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical anomaly detection faces dual challenges: few-shot, multi-class recognition difficulty and data silos. Existing few-shot methods treat anomalies as a single class, ignoring inter-class semantic distinctions; conversely, multi-class settings suffer from severe label scarcity, leading to model underfitting and ambiguous decision boundaries. This paper proposes a sign-driven, two-stage few-shot multi-class anomaly detection framework. First, a large language model generates fine-grained radiological sign descriptions; second, vision–language joint modeling achieves cross-modal alignment between signs and images, augmented by an adaptive sign filtering mechanism to mitigate few-shot uncertainty. Our key innovation lies in introducing clinically interpretable radiological signs as intermediate supervisory signals—explicitly encoding semantic disparities across multiple anomaly classes. Extensive experiments on multiple medical imaging benchmarks demonstrate significant improvements in few-shot multi-class detection accuracy and robustness, validated under three standardized evaluation protocols.

Technology Category

Application Category

📝 Abstract
Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.
Problem

Research questions and friction points this paper is trying to address.

Detecting multiple anomaly categories in medical images with limited data
Leveraging vision-language models for few-shot medical anomaly detection
Aligning radiological signs with anomaly categories to improve accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages vision-language models for few-shot learning
Aligns radiological signs with anomaly categories
Employs automatic sign selection strategy
K
Kaiyu Guo
Shanghai Academy of AI for Science, University of Queensland, Brisbane, Australia
Tan Pan
Tan Pan
Fudan University
Computer VisionAI4ScienceSelf-supervised Learning
C
Chen Jiang
Shanghai Academy of AI for Science, Shanghai, China
Z
Zijian Wang
University of Queensland, Brisbane, Australia
Brian C. Lovell
Brian C. Lovell
Professor, The University of Queensland
Artificial IntelligenceComputer VisionFace RecognitionPattern RecognitionBiometrics
L
Limei Han
Fudan University, Shanghai Academy of AI for Science, Shanghai, China
Y
Yuan Cheng
Fudan University, Shanghai Academy of AI for Science, Shanghai, China
Mahsa Baktashmotlagh
Mahsa Baktashmotlagh
University of Queensland
Machine LearningComputer Vision