Tracking Cancer Through Text: Longitudinal Extraction From Radiology Reports Using Open-Source Large Language Models

📅 2026-03-10

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Radiology reports capture longitudinal information—such as tumor burden and treatment response—in unstructured text, posing challenges for automated analysis. This work proposes a fully open-source, locally deployable pipeline based on the Qwen2.5-72B large language model, which, to our knowledge, is the first to enable longitudinal extraction and linkage of target lesions, non-target lesions, and new lesions in accordance with RECIST criteria within privacy-sensitive clinical environments. Integrated into the llm_extractinator framework, the system prioritizes data privacy, reproducibility, and clinical utility. Evaluated on 50 paired Dutch thoraco-abdominal CT reports, the pipeline achieved attribute-level accuracy of 93.7%, 94.9%, and 94.0% for the three lesion categories, respectively, demonstrating the high effectiveness of open-source large language models in clinical longitudinal tasks.

Technology Category

Application Category

📝 Abstract

Radiology reports capture crucial longitudinal information on tumor burden, treatment response, and disease progression, yet their unstructured narrative format complicates automated analysis. While large language models (LLMs) have advanced clinical text processing, most state-of-the-art systems remain proprietary, limiting their applicability in privacy-sensitive healthcare environments. We present a fully open-source, locally deployable pipeline for longitudinal information extraction from radiology reports, implemented using the \texttt{llm\_extractinator} framework. The system applies the \texttt{qwen2.5-72b} model to extract and link target, non-target, and new lesion data across time points in accordance with RECIST criteria. Evaluation on 50 Dutch CT Thorax/Abdomen report pairs yielded high extraction performance, with attribute-level accuracies of 93.7\% for target lesions, 94.9\% for non-target lesions, and 94.0\% for new lesions. The approach demonstrates that open-source LLMs can achieve clinically meaningful performance in multi-timepoint oncology tasks while ensuring data privacy and reproducibility. These results highlight the potential of locally deployable LLMs for scalable extraction of structured longitudinal data from routine clinical text.

Problem

Research questions and friction points this paper is trying to address.

radiology reports

longitudinal information extraction

cancer tracking

unstructured clinical text

data privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-source LLM

longitudinal information extraction

radiology reports