Development and Validation of a Dynamic-Template-Constrained Large Language Model for Generating Fully-Structured Radiology Reports

📅 2024-09-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current large language models (LLMs) exhibit formatting errors, factual hallucinations, and privacy leakage risks in radiology report structuring. To address these challenges, we propose the first open-source, on-premise framework tailored for CT-based pulmonary nodule reports, which accurately maps multi-institutional free-text reports into 27-field, feature-complete LCS-standard structured outputs. Our method introduces a novel dynamic template-constrained decoding mechanism, integrating expert-defined standardized templates, structured temporal modeling, and local inference—thereby eliminating formatting inconsistencies and hallucinations while ensuring strict data confinement. Evaluated across multi-institutional settings, our approach achieves an F1 score of 97.0%, outperforming GPT-4o by +17.19%. It further enables Lung-RADS distribution analytics and nodule-level semantic retrieval. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Current LLMs for creating fully-structured reports face the challenges of formatting errors, content hallucinations, and privacy leakage issues when uploading data to external servers.We aim to develop an open-source, accurate LLM for creating fully-structured and standardized LCS reports from varying free-text reports across institutions and demonstrate its utility in automatic statistical analysis and individual lung nodule retrieval. With IRB approvals, our retrospective study included 5,442 de-identified LDCT LCS radiology reports from two institutions. We constructed two evaluation datasets by labeling 500 pairs of free-text and fully-structured radiology reports and one large-scale consecutive dataset from January 2021 to December 2023. Two radiologists created a standardized template for recording 27 lung nodule features on LCS. We designed a dynamic-template-constrained decoding method to enhance existing LLMs for creating fully-structured reports from free-text radiology reports. Using consecutive structured reports, we automated descriptive statistical analyses and a nodule retrieval prototype. Our best LLM for creating fully-structured reports achieved high performance on cross-institutional datasets with an F1 score of about 97%, with neither formatting errors nor content hallucinations. Our method consistently improved the best open-source LLMs by up to 10.42%, and outperformed GPT-4o by 17.19%. The automatically derived statistical distributions were consistent with prior findings regarding attenuation, location, size, stability, and Lung-RADS. The retrieval system with structured reports allowed flexible nodule-level search and complex statistical analysis. Our developed software is publicly available for local deployment and further research.

Problem

Research questions and friction points this paper is trying to address.

Developing an open-source LLM for structured radiology reports

Addressing formatting errors and content hallucinations in reports

Enabling automatic statistical analysis from structured reports

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic-template-constrained decoding method enhances LLMs

Open-source model generates fully-structured radiology reports locally

Achieves high accuracy without formatting errors or hallucinations

🔎 Similar Papers

No similar papers found.