UW-BioNLP at ChemoTimelines 2025: Thinking, Fine-Tuning, and Dictionary-Enhanced LLM Systems for Chemotherapy Timeline Extraction

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the challenge of constructing systemic anticancer treatment timelines for cancer patients from electronic health records (EHRs), where chemotherapy event identification is difficult and low-frequency drug recall remains poor. To tackle these issues, we propose a two-stage extraction framework that synergistically integrates large language models (LLMs) with domain-specific knowledge. Our method innovatively combines chain-of-thought reasoning, supervised fine-tuning, and direct preference optimization, augmented by a curated chemotherapy drug dictionary to enhance semantic understanding and post-processing. This design significantly improves modeling capability for rare drugs and complex temporal relationships. Evaluated on a standard benchmark test set, our approach achieves the official state-of-the-art score of 0.678. Results demonstrate the effectiveness of multi-strategy collaborative training and dictionary-based enhancement. The framework establishes a reproducible and scalable paradigm for fine-grained treatment timeline mining in clinical text.

Technology Category

Application Category

📝 Abstract

The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our methods, results, and findings for subtask 2 -- generating patient chemotherapy timelines from raw clinical notes. We evaluated strategies involving chain-of-thought thinking, supervised fine-tuning, direct preference optimization, and dictionary-based lookup to improve timeline extraction. All of our approaches followed a two-step workflow, wherein an LLM first extracted chemotherapy events from individual clinical notes, and then an algorithm normalized and aggregated events into patient-level timelines. Each specific method differed in how the associated LLM was utilized and trained. Multiple approaches yielded competitive performances on the test set leaderboard, with fine-tuned Qwen3-14B achieving the best official score of 0.678. Our results and analyses could provide useful insights for future attempts on this task as well as the design of similar tasks.

Problem

Research questions and friction points this paper is trying to address.

Extract chemotherapy events from clinical notes

Normalize and aggregate events into patient timelines

Improve timeline extraction using LLM strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-thought thinking enhances LLM reasoning

Supervised fine-tuning and DPO optimize LLM performance

Dictionary-based lookup supplements event extraction accuracy

🔎 Similar Papers

No similar papers found.