🤖 AI Summary
Medical anomaly detection faces dual challenges: few-shot, multi-class recognition difficulty and data silos. Existing few-shot methods treat anomalies as a single class, ignoring inter-class semantic distinctions; conversely, multi-class settings suffer from severe label scarcity, leading to model underfitting and ambiguous decision boundaries. This paper proposes a sign-driven, two-stage few-shot multi-class anomaly detection framework. First, a large language model generates fine-grained radiological sign descriptions; second, vision–language joint modeling achieves cross-modal alignment between signs and images, augmented by an adaptive sign filtering mechanism to mitigate few-shot uncertainty. Our key innovation lies in introducing clinically interpretable radiological signs as intermediate supervisory signals—explicitly encoding semantic disparities across multiple anomaly classes. Extensive experiments on multiple medical imaging benchmarks demonstrate significant improvements in few-shot multi-class detection accuracy and robustness, validated under three standardized evaluation protocols.
📝 Abstract
Medical anomaly detection (AD) is crucial for early clinical intervention, yet it faces challenges due to limited access to high-quality medical imaging data, caused by privacy concerns and data silos. Few-shot learning has emerged as a promising approach to alleviate these limitations by leveraging the large-scale prior knowledge embedded in vision-language models (VLMs). Recent advancements in few-shot medical AD have treated normal and abnormal cases as a one-class classification problem, often overlooking the distinction among multiple anomaly categories. Thus, in this paper, we propose a framework tailored for few-shot medical anomaly detection in the scenario where the identification of multiple anomaly categories is required. To capture the detailed radiological signs of medical anomaly categories, our framework incorporates diverse textual descriptions for each category generated by a Large-Language model, under the assumption that different anomalies in medical images may share common radiological signs in each category. Specifically, we introduce SD-MAD, a two-stage Sign-Driven few-shot Multi-Anomaly Detection framework: (i) Radiological signs are aligned with anomaly categories by amplifying inter-anomaly discrepancy; (ii) Aligned signs are selected further to mitigate the effect of the under-fitting and uncertain-sample issue caused by limited medical data, employing an automatic sign selection strategy at inference. Moreover, we propose three protocols to comprehensively quantify the performance of multi-anomaly detection. Extensive experiments illustrate the effectiveness of our method.