🤖 AI Summary
This study addresses the challenge of constructing systemic anticancer treatment timelines for cancer patients from electronic health records (EHRs), where chemotherapy event identification is difficult and low-frequency drug recall remains poor. To tackle these issues, we propose a two-stage extraction framework that synergistically integrates large language models (LLMs) with domain-specific knowledge. Our method innovatively combines chain-of-thought reasoning, supervised fine-tuning, and direct preference optimization, augmented by a curated chemotherapy drug dictionary to enhance semantic understanding and post-processing. This design significantly improves modeling capability for rare drugs and complex temporal relationships. Evaluated on a standard benchmark test set, our approach achieves the official state-of-the-art score of 0.678. Results demonstrate the effectiveness of multi-strategy collaborative training and dictionary-based enhancement. The framework establishes a reproducible and scalable paradigm for fine-grained treatment timeline mining in clinical text.
📝 Abstract
The ChemoTimelines shared task benchmarks methods for constructing timelines of systemic anticancer treatment from electronic health records of cancer patients. This paper describes our methods, results, and findings for subtask 2 -- generating patient chemotherapy timelines from raw clinical notes. We evaluated strategies involving chain-of-thought thinking, supervised fine-tuning, direct preference optimization, and dictionary-based lookup to improve timeline extraction. All of our approaches followed a two-step workflow, wherein an LLM first extracted chemotherapy events from individual clinical notes, and then an algorithm normalized and aggregated events into patient-level timelines. Each specific method differed in how the associated LLM was utilized and trained. Multiple approaches yielded competitive performances on the test set leaderboard, with fine-tuned Qwen3-14B achieving the best official score of 0.678. Our results and analyses could provide useful insights for future attempts on this task as well as the design of similar tasks.