Deep Modeling and Optimization of Medical Image Classification

📅 2025-04-14
🏛️ IEEE International Symposium on Biomedical Imaging
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of few-shot learning and data privacy in medical imaging, this paper proposes a synergistic classification framework integrating a multimodal CLIP variant, federated learning (FL), and support vector machines (SVM) for accurate brain and skin cancer identification. Methodologically, we design the first hybrid CLIP architecture comprising four CNNs and eight Vision Transformers (ViTs); conduct the first systematic study on adapting CLIP to medical FL settings; and incorporate SVM to enhance cross-domain generalization. Experiments demonstrate an average performance of 87.03% on HAM10000; a federated F1-score of 83.98% for ConvNeXt-1; and an approximate 2% average accuracy gain for Swin-based models on ISIC2018 with SVM integration. This work establishes a scalable, highly generalizable multimodal federated learning paradigm for privacy-sensitive, few-shot medical image classification.

Technology Category

Application Category

📝 Abstract
Deep models, such as convolutional neural networks (CNNs) and vision transformer (ViT), demonstrate remarkable performance in image classification. However, those deep models require large data to fine-tune, which is impractical in the medical domain due to the data privacy issue. Furthermore, despite the feasible performance of contrastive language image pre-training (CLIP) in the natural domain, the potential of CLIP has not been fully investigated in the medical field. To face these challenges, we considered three scenarios: 1) we introduce a novel CLIP variant using four CNNs and eight ViTs as image encoders for the classification of brain cancer and skin cancer, 2) we combine 12 deep models with two federated learning techniques to protect data privacy, and 3) we involve traditional machine learning (ML) methods to improve the generalization ability of those deep models in unseen domain data. The experimental results indicate that maxvit shows the highest averaged (AVG) test metrics (AVG = 87.03%) in HAM10000 dataset with multimodal learning, while convnext_1 demonstrates remarkable test with an F1-score of 83.98% compared to swin_b with 81.33% in FL model. Furthermore, the use of support vector machine (SVM) can improve the overall test metrics with AVG of ~ 2% for swin transformer series in ISIC2018. Our codes are available at https://github.com/AIPMLab/SkinCancerSimulation.
Problem

Research questions and friction points this paper is trying to address.

Optimizing medical image classification with limited data
Exploring CLIP potential in medical imaging applications
Enhancing privacy via federated learning in healthcare
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel CLIP variant with CNNs and ViTs
Combines deep models with federated learning
Enhances generalization using traditional ML methods
🔎 Similar Papers
No similar papers found.