Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches

📅 2025-12-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current methods for identifying follow-up requirements for incidentalomas in radiology reports operate only at the document level, lacking lesion-level localization granularity. Method: We propose a novel large language model (LLM) inference paradigm integrating anatomy-aware prompting with lesion-specific markup inputs. We evaluate LLMs—including Llama 3.1-8B, GPT-4o, and GPT-OSS-20b—against supervised baselines (e.g., BioClinicalModernBERT) at the lesion level. Crucially, we incorporate anatomical structural priors into prompt design to enhance multi-site lesion localization and classification. Results: Anatomy-enhanced GPT-OSS-20b achieves a macro-F1 of 0.79, outperforming all supervised models; an integrated system further improves performance to 0.90—approaching inter-annotator agreement (Cohen’s κ = 0.92). This work delivers an interpretable, high-accuracy, fine-grained solution for clinical incidentaloma triage.

Technology Category

Application Category

📝 Abstract
Objective: To evaluate large language models (LLMs) against supervised baselines for fine-grained, lesion-level detection of incidentalomas requiring follow-up, addressing the limitations of current document-level classification systems. Methods: We utilized a dataset of 400 annotated radiology reports containing 1,623 verified lesion findings. We compared three supervised transformer-based encoders (BioClinicalModernBERT, ModernBERT, Clinical Longformer) against four generative LLM configurations (Llama 3.1-8B, GPT-4o, GPT-OSS-20b). We introduced a novel inference strategy using lesion-tagged inputs and anatomy-aware prompting to ground model reasoning. Performance was evaluated using class-specific F1-scores. Results: The anatomy-informed GPT-OSS-20b model achieved the highest performance, yielding an incidentaloma-positive macro-F1 of 0.79. This surpassed all supervised baselines (maximum macro-F1: 0.70) and closely matched the inter-annotator agreement of 0.76. Explicit anatomical grounding yielded statistically significant performance gains across GPT-based models (p < 0.05), while a majority-vote ensemble of the top systems further improved the macro-F1 to 0.90. Error analysis revealed that anatomy-aware LLMs demonstrated superior contextual reasoning in distinguishing actionable findings from benign lesions. Conclusion: Generative LLMs, when enhanced with structured lesion tagging and anatomical context, significantly outperform traditional supervised encoders and achieve performance comparable to human experts. This approach offers a reliable, interpretable pathway for automated incidental finding surveillance in radiology workflows.
Problem

Research questions and friction points this paper is trying to address.

Automated detection of incidentalomas needing follow-up in radiology reports
Evaluating LLMs versus supervised methods for lesion-level classification
Addressing limitations of document-level systems with anatomy-aware prompting
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used lesion-tagged inputs and anatomy-aware prompting for LLMs
Introduced a novel inference strategy to ground model reasoning
Applied a majority-vote ensemble to further improve performance
🔎 Similar Papers
No similar papers found.
Namu Park
Namu Park
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Farzad Ahmed
Farzad Ahmed
Department of Information Sciences and Technology, George Mason University, Fairfax, VA, USA
Z
Zhaoyi Sun
Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Kevin Lybarger
Kevin Lybarger
George Mason University
machine learningnatural language processinginformation extractionclinical informatics
E
Ethan Breinhorst
Department of Radiology, Te Whatu Ora Health New Zealand, Te Toka Tumai Auckland, Auckland, New Zealand
J
Julie Hu
Department of Radiology, Te Whatu Ora Health New Zealand, Te Toka Tumai Auckland, Auckland, New Zealand
Ozlem Uzuner
Ozlem Uzuner
George Mason University
Artificial intelligencenatural language processingmedical informatics
M
Martin Gunn
Department of Radiology, School of Medicine, University of Washington, Seattle, WA, USA
Meliha Yetisgen
Meliha Yetisgen
Professor, University of Washington
Natural language processinginformation extractioninformation retrievalclinical text processing