🤖 AI Summary
Current large language models (LLMs) exhibit formatting errors, factual hallucinations, and privacy leakage risks in radiology report structuring. To address these challenges, we propose the first open-source, on-premise framework tailored for CT-based pulmonary nodule reports, which accurately maps multi-institutional free-text reports into 27-field, feature-complete LCS-standard structured outputs. Our method introduces a novel dynamic template-constrained decoding mechanism, integrating expert-defined standardized templates, structured temporal modeling, and local inference—thereby eliminating formatting inconsistencies and hallucinations while ensuring strict data confinement. Evaluated across multi-institutional settings, our approach achieves an F1 score of 97.0%, outperforming GPT-4o by +17.19%. It further enables Lung-RADS distribution analytics and nodule-level semantic retrieval. The implementation is publicly available.
📝 Abstract
Current LLMs for creating fully-structured reports face the challenges of formatting errors, content hallucinations, and privacy leakage issues when uploading data to external servers.We aim to develop an open-source, accurate LLM for creating fully-structured and standardized LCS reports from varying free-text reports across institutions and demonstrate its utility in automatic statistical analysis and individual lung nodule retrieval. With IRB approvals, our retrospective study included 5,442 de-identified LDCT LCS radiology reports from two institutions. We constructed two evaluation datasets by labeling 500 pairs of free-text and fully-structured radiology reports and one large-scale consecutive dataset from January 2021 to December 2023. Two radiologists created a standardized template for recording 27 lung nodule features on LCS. We designed a dynamic-template-constrained decoding method to enhance existing LLMs for creating fully-structured reports from free-text radiology reports. Using consecutive structured reports, we automated descriptive statistical analyses and a nodule retrieval prototype. Our best LLM for creating fully-structured reports achieved high performance on cross-institutional datasets with an F1 score of about 97%, with neither formatting errors nor content hallucinations. Our method consistently improved the best open-source LLMs by up to 10.42%, and outperformed GPT-4o by 17.19%. The automatically derived statistical distributions were consistent with prior findings regarding attenuation, location, size, stability, and Lung-RADS. The retrieval system with structured reports allowed flexible nodule-level search and complex statistical analysis. Our developed software is publicly available for local deployment and further research.