UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of constructing systemic anticancer treatment timelines for cancer patients from electronic health records (EHRs), where chemotherapy event identification is difficult and low-frequency drug recall remains poor. To tackle these issues, we propose a two-stage extraction framework that synergistically integrates large language models (LLMs) with domain-specific knowledge. Our method innovatively combines chain-of-thought reasoning, supervised fine-tuning, and direct preference optimization, augmented by a curated chemotherapy drug dictionary to enhance semantic understanding and post-processing. This design significantly improves modeling capability for rare drugs and complex temporal relationships. Evaluated on a standard benchmark test set, our approach achieves the official state-of-the-art score of 0.678. Results demonstrate the effectiveness of multi-strategy collaborative training and dictionary-based enhancement. The framework establishes a reproducible and scalable paradigm for fine-grained treatment timeline mining in clinical text.

Technology Category

Application Category

📝 Abstract
The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our methods, results, and findings for subtask 2 -- generating patient chemotherapy timelines from raw clinical notes. We evaluated strategies involving chain-of-thought thinking, supervised fine-tuning, direct preference optimization, and dictionary-based lookup to improve timeline extraction. All of our approaches followed a two-step workflow, wherein an LLM first extracted chemotherapy events from individual clinical notes, and then an algorithm normalized and aggregated events into patient-level timelines. Each specific method differed in how the associated LLM was utilized and trained. Multiple approaches yielded competitive performances on the test set leaderboard, with fine-tuned Qwen3-14B achieving the best official score of 0.678. Our results and analyses could provide useful insights for future attempts on this task as well as the design of similar tasks.
Problem

Research questions and friction points this paper is trying to address.

Extract chemotherapy events from clinical notes
Normalize and aggregate events into patient timelines
Improve timeline extraction using LLM strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought thinking enhances LLM reasoning
Supervised fine-tuning and DPO optimize LLM performance
Dictionary-based lookup supplements event extraction accuracy
🔎 Similar Papers
No similar papers found.
T
Tianmai M. Zhang
University of Washington
Z
Zhaoyi Sun
University of Washington
Sihang Zeng
Sihang Zeng
University of Washington
Biomedical InformaticsMachine Learning for Healthcare
C
Chenxi Li
University of Washington
N
Neil F. Abernethy
University of Washington
B
Barbara D. Lam
University of Washington
F
Fei Xia
University of Washington
Meliha Yetisgen
Meliha Yetisgen
Professor, University of Washington
Natural language processinginformation extractioninformation retrievalclinical text processing