Vision-Language Model Based Multi-Expert Fusion for CT Image Classification

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of insufficient classification robustness in multi-institutional COVID-19 CT imaging, stemming from data distribution shifts, class imbalance, and unknown test-set origins. To tackle this, the authors propose a three-stage source-aware multi-expert fusion framework: first, a lung-aware 3D CNN performs voxel-level classification; second, a MedSigLIP vision-language model integrated with a Transformer captures slice-level representations and cross-slice contextual information; third, a source classifier predicts the origin of test samples to dynamically weight expert outputs. This work pioneers the integration of vision-language models with a source-aware multi-expert mechanism, leveraging a hierarchical voting strategy to significantly enhance generalization across heterogeneous multi-source data. Evaluated on a validation set encompassing four data sources, the model achieves a Macro-F1 score of 0.9711, an AUC of 0.9864, and a source classification accuracy of 0.9107, demonstrating markedly improved robustness.

Technology Category

Application Category

📝 Abstract
Robust detection of COVID-19 from chest CT remains challenging in multi-institutional settings due to substantial source shift, source imbalance, and hidden test-source identities. In this work, we propose a three-stage source-aware multi-expert framework for multi-source COVID-19 CT classification. First, we build a lung-aware 3D expert by combining original CT volumes and lung-extracted CT volumes for volumetric classification. Second, we develop two MedSigLIP-based experts: a slice-wise representation and probability learning module, and a Transformer-based inter-slice context modeling module for capturing cross-slice dependency. Third, we train a source classifier to predict the latent source identity of each test scan. By leveraging the predicted source information, we perform model fusion and voting based on different experts. On the validation set covering all four sources, the Stage 1 model achieves the best macro-F1 of 0.9711, ACC of 0.9712, and AUC of 0.9791. Stage~2a and Stage~2b achieve the best AUC scores of 0.9864 and 0.9854, respectively. Stage~3 source classifier reaches 0.9107 ACC and 0.9114 F1. These results demonstrate that source-aware expert modeling and hierarchical voting provide an effective solution for robust COVID-19 CT classification under heterogeneous multi-source conditions.
Problem

Research questions and friction points this paper is trying to address.

source shift
source imbalance
hidden test-source identities
multi-institutional CT classification
robust COVID-19 detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-expert fusion
source-aware learning
MedSigLIP
3D lung-aware modeling
inter-slice context modeling
🔎 Similar Papers
No similar papers found.
J
Jianfa Bai
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
K
Kejin Lu
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
R
Runtian Yuan
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
Q
Qingqiu Li
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
Jilan Xu
Jilan Xu
Fudan University
Computer VisionMultimodalMedical Image Analysis
Junlin Hou
Junlin Hou
HKUST | Fudan University
Computer VisionMedical Image AnalysisLabel-efficient Deep LearningeXplainable AI
Y
Yuejie Zhang
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University
R
Rui Feng
College of Computer Science and Artificial Intelligence, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University