EMeRALDS: Electronic Medical Record Driven Automated Lung Nodule Detection and Classification in Thoracic CT Images

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Early lung cancer diagnosis is hindered by insufficient accuracy in pulmonary nodule detection and classification from CT scans. To address this, we propose an end-to-end computer-aided diagnosis (CAD) system integrating vision-language modeling, radiomics, and synthetically generated structured electronic health records (EHRs). Specifically, we introduce a zero-shot nodule segmentation module where clinical text prompts—encoded via CLIP—guide the SAM2 model for precise delineation without manual annotations. Subsequently, radiomic features are fused with expert-crafted synthetic EHRs to enhance classification interpretability and generalizability. Evaluated on the LIDC-IDRI dataset, our segmentation achieves a Dice score of 0.92 and IoU of 0.85; classification attains a specificity of 0.97, surpassing state-of-the-art fully supervised methods. This work is the first to jointly leverage text-guided segmentation and structured clinical context modeling for pulmonary nodule diagnosis, markedly improving clinical applicability under low-labeling-cost scenarios.

Technology Category

Application Category

📝 Abstract
Objective: Lung cancer is a leading cause of cancer-related mortality worldwide, primarily due to delayed diagnosis and poor early detection. This study aims to develop a computer-aided diagnosis (CAD) system that leverages large vision-language models (VLMs) for the accurate detection and classification of pulmonary nodules in computed tomography (CT) scans. Methods: We propose an end-to-end CAD pipeline consisting of two modules: (i) a detection module (CADe) based on the Segment Anything Model 2 (SAM2), in which the standard visual prompt is replaced with a text prompt encoded by CLIP (Contrastive Language-Image Pretraining), and (ii) a diagnosis module (CADx) that calculates similarity scores between segmented nodules and radiomic features. To add clinical context, synthetic electronic medical records (EMRs) were generated using radiomic assessments by expert radiologists and combined with similarity scores for final classification. The method was tested on the publicly available LIDC-IDRI dataset (1,018 CT scans). Results: The proposed approach demonstrated strong performance in zero-shot lung nodule analysis. The CADe module achieved a Dice score of 0.92 and an IoU of 0.85 for nodule segmentation. The CADx module attained a specificity of 0.97 for malignancy classification, surpassing existing fully supervised methods. Conclusions: The integration of VLMs with radiomics and synthetic EMRs allows for accurate and clinically relevant CAD of pulmonary nodules in CT scans. The proposed system shows strong potential to enhance early lung cancer detection, increase diagnostic confidence, and improve patient management in routine clinical workflows.
Problem

Research questions and friction points this paper is trying to address.

Automated detection and classification of lung nodules
Leveraging vision-language models for CT scan analysis
Improving early lung cancer diagnosis accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

VLMs with text prompts for nodule detection
Similarity scores with radiomic features for diagnosis
Synthetic EMRs combined for clinical context classification
🔎 Similar Papers
No similar papers found.