Minimum Data, Maximum Impact: 20 annotated samples for explainable lung nodule classification

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the severe scarcity of pathology-oriented visual attribute annotations (e.g., spiculation, lobulation, vacuole) hindering the performance of interpretable models for pulmonary nodule classification, this work proposes a few-shot attribute-conditional diffusion model for medical image synthesis. Leveraging only 20 real annotated CT scans, we construct a diffusion generative model capable of precisely controlling clinically relevant pathological attributes while producing high-fidelity, semantically consistent nodule images. The synthesized data augment training of an interpretable classification model, preserving decision transparency while substantially improving performance: attribute prediction accuracy increases by 13.4%, and benign/malignant classification accuracy improves by 1.8%. To our knowledge, this is the first study to apply ultra-low-shot conditional diffusion modeling to generate radiologist-defined clinical visual attributes. Our approach establishes a novel paradigm for deploying few-shot interpretable AI in medical imaging.

Technology Category

Application Category

📝 Abstract

Classification models that provide human-interpretable explanations enhance clinicians' trust and usability in medical image diagnosis. One research focus is the integration and prediction of pathology-related visual attributes used by radiologists alongside the diagnosis, aligning AI decision-making with clinical reasoning. Radiologists use attributes like shape and texture as established diagnostic criteria and mirroring these in AI decision-making both enhances transparency and enables explicit validation of model outputs. However, the adoption of such models is limited by the scarcity of large-scale medical image datasets annotated with these attributes. To address this challenge, we propose synthesizing attribute-annotated data using a generative model. We enhance the Diffusion Model with attribute conditioning and train it using only 20 attribute-labeled lung nodule samples from the LIDC-IDRI dataset. Incorporating its generated images into the training of an explainable model boosts performance, increasing attribute prediction accuracy by 13.4% and target prediction accuracy by 1.8% compared to training with only the small real attribute-annotated dataset. This work highlights the potential of synthetic data to overcome dataset limitations, enhancing the applicability of explainable models in medical image analysis.

Problem

Research questions and friction points this paper is trying to address.

Lack of large medical datasets with annotated visual attributes

Need for explainable AI models aligning with clinical reasoning

Improving accuracy with minimal real annotated lung nodule samples

Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribute-conditioned Diffusion Model for data synthesis

20 annotated samples for explainable classification

Synthetic data boosts accuracy by 13.4%

🔎 Similar Papers

Leveraging Expert Input for Robust and Explainable AI-Assisted Lung Cancer Detection in Chest X-rays