🤖 AI Summary
In lung cancer diagnosis, precise morphological description of pulmonary nodules in CT scans—integrated with clinical knowledge—remains challenging, limiting the interpretability and clinical reliability of multimodal large models. To address this, we propose a collaborative multi-agent system that decomposes diagnosis into three sequential stages: nodule detection, fine-grained morphological description, and benign-malignant reasoning. We introduce a novel region-level semantic alignment mechanism, enabling the first end-to-end deep integration of clinical knowledge bases with vision-language models (VLMs). The system synergistically combines a dedicated detection model, a local image descriptor, a VLM, and a pathology-knowledge-driven reasoning module. Evaluated on LIDC-IDRI and multiple private datasets, our method significantly outperforms state-of-the-art vision-language models and conventional expert systems in nodule description accuracy, malignant classification AUC, and clinician interpretability.
📝 Abstract
Diagnosing lung cancer typically involves physicians identifying lung nodules in Computed tomography (CT) scans and generating diagnostic reports based on their morphological features and medical expertise. Although advancements have been made in using multimodal large language models for analyzing lung CT scans, challenges remain in accurately describing nodule morphology and incorporating medical expertise. These limitations affect the reliability and effectiveness of these models in clinical settings. Collaborative multi-agent systems offer a promising strategy for achieving a balance between generality and precision in medical applications, yet their potential in pathology has not been thoroughly explored. To bridge these gaps, we introduce LungNoduleAgent, an innovative collaborative multi-agent system specifically designed for analyzing lung CT scans. LungNoduleAgent streamlines the diagnostic process into sequential components, improving precision in describing nodules and grading malignancy through three primary modules. The first module, the Nodule Spotter, coordinates clinical detection models to accurately identify nodules. The second module, the Radiologist, integrates localized image description techniques to produce comprehensive CT reports. Finally, the Doctor Agent System performs malignancy reasoning by using images and CT reports, supported by a pathology knowledge base and a multi-agent system framework. Extensive testing on two private datasets and the public LIDC-IDRI dataset indicates that LungNoduleAgent surpasses mainstream vision-language models, agent systems, and advanced expert models. These results highlight the importance of region-level semantic alignment and multi-agent collaboration in diagnosing nodules. LungNoduleAgent stands out as a promising foundational tool for supporting clinical analyses of lung nodules.