UltraAD: Fine-Grained Ultrasound Anomaly Classification via Few-Shot CLIP Adaptation

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of fine-grained benign/malignant tumor classification and precise localization in ultrasound images—as well as poor cross-device generalizability—this paper proposes a vision-language–based few-shot anomaly detection method. The method builds upon the CLIP framework and enables adaptive training with only a small number of annotated samples. Its key contributions are: (1) an image-guided prompt fusion mechanism that injects anatomical structural priors into textual prompts; (2) a frozen text memory bank to align lesion semantics with imaging features across domains; and (3) patch-level feature refinement coupled with learnable text embeddings to enhance local discriminative capability. Evaluated on three breast ultrasound datasets, the method significantly improves both lesion localization accuracy and benign/malignant classification performance, while effectively mitigating domain shift induced by heterogeneous ultrasound equipment. This work establishes a novel paradigm for clinical ultrasound–assisted diagnosis.

Technology Category

Application Category

📝 Abstract
Precise anomaly detection in medical images is critical for clinical decision-making. While recent unsupervised or semi-supervised anomaly detection methods trained on large-scale normal data show promising results, they lack fine-grained differentiation, such as benign vs. malignant tumors. Additionally, ultrasound (US) imaging is highly sensitive to devices and acquisition parameter variations, creating significant domain gaps in the resulting US images. To address these challenges, we propose UltraAD, a vision-language model (VLM)-based approach that leverages few-shot US examples for generalized anomaly localization and fine-grained classification. To enhance localization performance, the image-level token of query visual prototypes is first fused with learnable text embeddings. This image-informed prompt feature is then further integrated with patch-level tokens, refining local representations for improved accuracy. For fine-grained classification, a memory bank is constructed from few-shot image samples and corresponding text descriptions that capture anatomical and abnormality-specific features. During training, the stored text embeddings remain frozen, while image features are adapted to better align with medical data. UltraAD has been extensively evaluated on three breast US datasets, outperforming state-of-the-art methods in both lesion localization and fine-grained medical classification. The code will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Lack fine-grained differentiation in anomaly detection
Ultrasound imaging sensitive to device variations
Need generalized anomaly localization and classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-shot CLIP adaptation for anomaly classification
Image-informed prompt fusion for localization
Memory bank with frozen text embeddings
🔎 Similar Papers
No similar papers found.
Y
Yue Zhou
Computer Aided Medical Procedures (CAMP), TU Munich, Germany; Munich Center for Machine Learning (MCML), Munich, Germany
Yuan Bi
Yuan Bi
Technical University of Munich
Robotic UltrasoundUltrasound Image Processing
W
Wenjuan Tong
The First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
W
Wei Wang
The First Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China
Nassir Navab
Nassir Navab
Professor of Computer Science, Technische Universität München
Zhongliang Jiang
Zhongliang Jiang
University of Hong Kong
Medical RoboticsUltrasound imagingRobot learningSurgical RoboticsHuman-robot Interaction