LEAVS: An LLM-based Labeler for Abdominal CT Supervision

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Structured annotation of abdominal CT reports faces challenges including numerous anatomical organs, diverse abnormality types, and clinical urgency requiring joint assessment—existing methods struggle to balance fine-grained granularity and reliability. Method: We propose the first structured label generation system for abdominal CT, supporting dual-dimensional (presence and urgency) automatic annotation across 9 organs and 7 abnormality categories. Our approach introduces a novel tree-structured chain-of-thought prompting framework that integrates sentence-level extraction with multi-choice decision making, enabling end-to-end inference on locally deployed large language models. Contribution/Results: The system achieves an average F1 score of 0.89; urgency assessments align with expert consensus. Generated labels successfully train a unified vision model capable of simultaneous multi-organ abnormality detection. We publicly release both the source code and a high-quality, structured dataset comprising over 1,000 cases—filling a critical gap in fine-grained, anatomy-region–specific supervised learning for abdominal imaging.

Technology Category

Application Category

📝 Abstract
Extracting structured labels from radiology reports has been employed to create vision models to simultaneously detect several types of abnormalities. However, existing works focus mainly on the chest region. Few works have been investigated on abdominal radiology reports due to more complex anatomy and a wider range of pathologies in the abdomen. We propose LEAVS (Large language model Extractor for Abdominal Vision Supervision). This labeler can annotate the certainty of presence and the urgency of seven types of abnormalities for nine abdominal organs on CT radiology reports. To ensure broad coverage, we chose abnormalities that encompass most of the finding types from CT reports. Our approach employs a specialized chain-of-thought prompting strategy for a locally-run LLM using sentence extraction and multiple-choice questions in a tree-based decision system. We demonstrate that the LLM can extract several abnormality types across abdominal organs with an average F1 score of 0.89, significantly outperforming competing labelers and humans. Additionally, we show that extraction of urgency labels achieved performance comparable to human annotations. Finally, we demonstrate that the abnormality labels contain valuable information for training a single vision model that classifies several organs as normal or abnormal. We release our code and structured annotations for a public CT dataset containing over 1,000 CT volumes.
Problem

Research questions and friction points this paper is trying to address.

Extract structured labels from abdominal CT radiology reports.
Detect and classify abnormalities in nine abdominal organs.
Train vision models using extracted abnormality and urgency labels.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based labeler for abdominal CT reports
Chain-of-thought prompting with tree-based decisions
Extracts abnormality certainty and urgency labels
🔎 Similar Papers
No similar papers found.
R
Ricardo Bigolin Lanfredi
National Institutes of Health Clinical Center, Bethesda, MD 20892, USA
Z
Zhuang Yan
Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Windreich Department of Artificial Intelligence and Human Health, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
M
Mark Finkelstein
Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
P
Praveen Thoppey Srinivasan Balamuralikrishna
National Institutes of Health Clinical Center, Bethesda, MD 20892, USA
L
Luke Krembs
Walter Reed National Military Medical Center, Bethesda, 20892, MD, USA
B
Brandon Khoury
Walter Reed National Military Medical Center, Bethesda, 20892, MD, USA
A
Arthi Reddy
Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Pritam Mukherjee
Pritam Mukherjee
National Institutes of Health Clinical Center
machine learning for healthcaremedical imaging
N
Neil M. Rofsky
Department of Diagnostic, Molecular and Interventional Radiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
R
Ronald M. Summers
National Institutes of Health Clinical Center, Bethesda, MD 20892, USA